INFORMATION LOSS FOR SYNTHETIC DATA THROUGH FUZZY CLUSTERING

Author(s):  
SUSANA LADRA ◽  
VICENÇ TORRA

Synthetic data generators are one of the methods used in privacy preserving data mining for ensuring the privacy of the individuals when their data are published. Synthetic data generators construct artificial data from some models obtained from the original data. Such models are mainly based on statistics and, typically, do not take into account other aspects of interest in artificial intelligence. In this paper we study whether one family of such synthetic data generators (the IPSO family) preserves the properties of the data that are of interest when users plan to apply clustering techniques. In particular, we study the effect of such synthetic data generators on fuzzy clustering. That is, we study the information loss data suffer when the original data are replaced by the synthetic ones.

2013 ◽  
Vol 2013 ◽  
pp. 1-6 ◽  
Author(s):  
Gábor Szűcs

The paper deals with classification in privacy-preserving data mining. An algorithm, the Random Response Forest, is introduced constructing many binary decision trees, as an extension of Random Forest for privacy-preserving problems. Random Response Forest uses the Random Response idea among the anonymization methods, which instead of generalization keeps the original data, but mixes them. An anonymity metric is defined for undistinguishability of two mixed sets of data. This metric, the binary anonymity, is investigated and taken into consideration for optimal coding of the binary variables. The accuracy of Random Response Forest is presented at the end of the paper.


2014 ◽  
Vol 2014 ◽  
pp. 1-12 ◽  
Author(s):  
Pawan R. Bhaladhare ◽  
Devesh C. Jinwala

In privacy preserving data mining, the l-diversity and k-anonymity models are the most widely used for preserving the sensitive private information of an individual. Out of these two, l-diversity model gives better privacy and lesser information loss as compared to the k-anonymity model. In addition, we observe that numerous clustering algorithms have been proposed in data mining, namely, k-means, PSO, ACO, and BFO. Amongst them, the BFO algorithm is more stable and faster as compared to all others except k-means. However, BFO algorithm suffers from poor convergence behavior as compared to other optimization algorithms. We also observed that the current literature lacks any approaches that apply BFO with l-diversity model to realize privacy preservation in data mining. Motivated by this observation, we propose here an approach that uses fractional calculus (FC) in the chemotaxis step of the BFO algorithm. The FC is used to boost the computational performance of the algorithm. We also evaluate our proposed FC-BFO and BFO algorithms empirically, focusing on information loss and execution time as vital metrics. The experimental evaluation shows that our proposed FC-BFO algorithm derives an optimal cluster as compared to the original BFO algorithm and existing clustering algorithms.


2013 ◽  
Vol 6 (3) ◽  
pp. 370-378
Author(s):  
Meenakshi Vishnoi ◽  
Seeja K. R

Data mining is a very active research area that deals with the extraction of  knowledge from very large databases. Data mining has made knowledge extraction and decision making easy. The extracted knowledge could reveal the personal information , if the data contains various private and sensitive attributes about an individual. This poses a threat to the personal information as there is a possibility of misusing the information behind the scenes without the knowledge of the individual. So, privacy becomes a great concern for the data owners and the organizations  as none of the organizations would like to share their data. To solve this problem Privacy Preserving Data Mining technique have emerged and also solved problems of various domains as it provides the benefit of data mining without compromising the privacy of an individual. This paper proposes a privacy preserving data mining technique the uses randomized perturbation and cryptographic technique. The performance evaluation of the proposed technique shows the same result with the modified data and the original data.


Author(s):  
Jun Zhang ◽  
Jie Wang ◽  
Shuting Xu

Data mining technologies have now been used in commercial, industrial, and governmental businesses, for various purposes, ranging from increasing profitability to enhancing national security. The widespread applications of data mining technologies have raised concerns about trade secrecy of corporations and privacy of innocent people contained in the datasets collected and used for the data mining purpose. It is necessary that data mining technologies designed for knowledge discovery across corporations and for security purpose towards general population have sufficient privacy awareness to protect the corporate trade secrecy and individual private information. Unfortunately, most standard data mining algorithms are not very efficient in terms of privacy protection, as they were originally developed mainly for commercial applications, in which different organizations collect and own their private databases, and mine their private databases for specific commercial purposes. In the cases of inter-corporation and security data mining applications, data mining algorithms may be applied to datasets containing sensitive or private information. Data warehouse owners and government agencies may potentially have access to many databases collected from different sources and may extract any information from these databases. This potentially unlimited access to data and information raises the fear of possible abuse and promotes the call for privacy protection and due process of law. Privacy-preserving data mining techniques have been developed to address these concerns (Fung et al., 2007; Zhang, & Zhang, 2007). The general goal of the privacy-preserving data mining techniques is defined as to hide sensitive individual data values from the outside world or from unauthorized persons, and simultaneously preserve the underlying data patterns and semantics so that a valid and efficient decision model based on the distorted data can be constructed. In the best scenarios, this new decision model should be equivalent to or even better than the model using the original data from the viewpoint of decision accuracy. There are currently at least two broad classes of approaches to achieving this goal. The first class of approaches attempts to distort the original data values so that the data miners (analysts) have no means (or greatly reduced ability) to derive the original values of the data. The second is to modify the data mining algorithms so that they allow data mining operations on distributed datasets without knowing the exact values of the data or without direct accessing the original datasets. This article only discusses the first class of approaches. Interested readers may consult (Clifton et al., 2003) and the references therein for discussions on distributed data mining approaches.


2018 ◽  
Vol 8 (1) ◽  
Author(s):  
Desmond Ko Khang Siang ◽  
Siti Hajar Othman ◽  
Raja Zahilah Raja Mohd Radzi

Data Mining is a computational process that able to identify patterns, trends and behaviour from large datasets. With this advantages, data mining has been applied in many fields such as finance, healthcare, retail and so on. However, information disclosure become one of an issue during data mining process. Therefore, privacy protection is needed during data mining process which known as Privacy Preserving Data Mining (PPDM). There are several techniques available in PPDM and each of the techniques has its’ own benefits and drawbacks. In this research, perturbation technique is selected as privacy preserving technique. Perturbation technique is a method that alters the original data value before the application of data mining. In PPDM applications, perturbation technique able to provide a protection of data privacy but the accuracy of data should not be ignored too. In this research, three perturbation techniques are selected which are additive noise, data swapping and resample. For data mining techniques, two methods of classification are selected which are Naïve Bayes and Support Vector Machines (SVM). With the selection of these techniques, the experimental results are evaluated based on the hiding failure, accuracy and precision. For overall result, resample is selected as the best perturbation technique in naïve bayes and SVM classification for both glass and ionosphere datasets.


2014 ◽  
Vol 10 (1) ◽  
pp. 55-76 ◽  
Author(s):  
Mohammad Reza Keyvanpour ◽  
Somayyeh Seifi Moradi

In this study, a new model is provided for customized privacy in privacy preserving data mining in which the data owners define different levels for privacy for different features. Additionally, in order to improve perturbation methods, a method combined of singular value decomposition (SVD) and feature selection methods is defined so as to benefit from the advantages of both domains. Also, to assess the amount of distortion created by the proposed perturbation method, new distortion criteria are defined in which the amount of created distortion in the process of feature selection is considered based on the value of privacy in each feature. Different tests and results analysis show that offered method based on this model compared to previous approaches, caused the improved privacy, accuracy of mining results and efficiency of privacy preserving data mining systems.


Sign in / Sign up

Export Citation Format

Share Document