INFORMATION LOSS FOR SYNTHETIC DATA THROUGH FUZZY CLUSTERING

Synthetic data generators are one of the methods used in privacy preserving data mining for ensuring the privacy of the individuals when their data are published. Synthetic data generators construct artificial data from some models obtained from the original data. Such models are mainly based on statistics and, typically, do not take into account other aspects of interest in artificial intelligence. In this paper we study whether one family of such synthetic data generators (the IPSO family) preserves the properties of the data that are of interest when users plan to apply clustering techniques. In particular, we study the effect of such synthetic data generators on fuzzy clustering. That is, we study the information loss data suffer when the original data are replaced by the synthetic ones.

Download Full-text

Random Response Forest for Privacy-Preserving Classification

Journal of Computational Engineering ◽

10.1155/2013/397096 ◽

2013 ◽

Vol 2013 ◽

pp. 1-6 ◽

Cited By ~ 3

Author(s):

Gábor Szűcs

Keyword(s):

Data Mining ◽

Random Forest ◽

Decision Trees ◽

Original Data ◽

Privacy Preserving ◽

Random Response ◽

Privacy Preserving Data Mining ◽

Binary Variables ◽

Binary Decision ◽

Optimal Coding

The paper deals with classification in privacy-preserving data mining. An algorithm, the Random Response Forest, is introduced constructing many binary decision trees, as an extension of Random Forest for privacy-preserving problems. Random Response Forest uses the Random Response idea among the anonymization methods, which instead of generalization keeps the original data, but mixes them. An anonymity metric is defined for undistinguishability of two mixed sets of data. This metric, the binary anonymity, is investigated and taken into consideration for optimal coding of the binary variables. The accuracy of Random Response Forest is presented at the end of the paper.

Download Full-text

Evaluation of information loss for privacy preserving data mining through comparison of fuzzy partitions

International Conference on Fuzzy Systems ◽

10.1109/fuzzy.2010.5584186 ◽

2010 ◽

Cited By ~ 5

Author(s):

Isaac Cano ◽

Susana Ladra ◽

Vicenc Torra

Keyword(s):

Data Mining ◽

Privacy Preserving ◽

Information Loss ◽

Privacy Preserving Data Mining ◽

Fuzzy Partitions

Download Full-text

A Clustering Approach for the l-Diversity Model in Privacy Preserving Data Mining Using Fractional Calculus-Bacterial Foraging Optimization Algorithm

Advances in Computer Engineering ◽

10.1155/2014/396529 ◽

2014 ◽

Vol 2014 ◽

pp. 1-12 ◽

Cited By ~ 23

Author(s):

Pawan R. Bhaladhare ◽

Devesh C. Jinwala

Keyword(s):

Data Mining ◽

Fractional Calculus ◽

Private Information ◽

Privacy Preservation ◽

Clustering Algorithms ◽

Privacy Preserving ◽

Information Loss ◽

Bacterial Foraging Optimization ◽

Privacy Preserving Data Mining ◽

Computational Performance

In privacy preserving data mining, the l-diversity and k-anonymity models are the most widely used for preserving the sensitive private information of an individual. Out of these two, l-diversity model gives better privacy and lesser information loss as compared to the k-anonymity model. In addition, we observe that numerous clustering algorithms have been proposed in data mining, namely, k-means, PSO, ACO, and BFO. Amongst them, the BFO algorithm is more stable and faster as compared to all others except k-means. However, BFO algorithm suffers from poor convergence behavior as compared to other optimization algorithms. We also observed that the current literature lacks any approaches that apply BFO with l-diversity model to realize privacy preservation in data mining. Motivated by this observation, we propose here an approach that uses fractional calculus (FC) in the chemotaxis step of the BFO algorithm. The FC is used to boost the computational performance of the algorithm. We also evaluate our proposed FC-BFO and BFO algorithms empirically, focusing on information loss and execution time as vital metrics. The experimental evaluation shows that our proposed FC-BFO algorithm derives an optimal cluster as compared to the original BFO algorithm and existing clustering algorithms.

Download Full-text

Privacy Preserving Data Mining using Attribute Encryption and Data Perturbation

INTERNATIONAL JOURNAL OF COMPUTERS & TECHNOLOGY ◽

10.24297/ijct.v6i3.4461 ◽

2013 ◽

Vol 6 (3) ◽

pp. 370-378

Author(s):

Meenakshi Vishnoi ◽

Seeja K. R

Keyword(s):

Data Mining ◽

Personal Information ◽

Original Data ◽

Research Area ◽

Privacy Preserving ◽

Data Mining Technique ◽

Privacy Preserving Data Mining ◽

Mining Technique ◽

Active Research ◽

The Individual

Data mining is a very active research area that deals with the extraction of Â knowledge from very large databases. Data mining has made knowledge extraction and decision making easy. The extracted knowledge could reveal the personal information , if the data contains various private and sensitive attributes about an individual. This poses a threat to the personal information as there is a possibility of misusing the information behind the scenes without the knowledge of the individual. So, privacy becomes a great concern for the data owners and the organizations Â as none of the organizations would like to share their data. To solve this problem Privacy Preserving Data Mining technique have emerged and also solved problems of various domains as it provides the benefit of data mining without compromising the privacy of an individual. This paper proposes a privacy preserving data mining technique the uses randomized perturbation and cryptographic technique. The performance evaluation of the proposed technique shows the same result with the modified data and the original data.

Download Full-text

Matrix Decomposition Techniques for Data Privacy

Encyclopedia of Data Warehousing and Mining, Second Edition ◽

10.4018/978-1-60566-010-3.ch185 ◽

2011 ◽

pp. 1188-1193

Author(s):

Jun Zhang ◽

Jie Wang ◽

Shuting Xu

Keyword(s):

Data Mining ◽

Private Information ◽

Privacy Protection ◽

Decision Model ◽

Original Data ◽

Privacy Preserving ◽

Privacy Preserving Data Mining ◽

Data Mining Techniques ◽

Data Mining Algorithms ◽

Mining Algorithms

Data mining technologies have now been used in commercial, industrial, and governmental businesses, for various purposes, ranging from increasing profitability to enhancing national security. The widespread applications of data mining technologies have raised concerns about trade secrecy of corporations and privacy of innocent people contained in the datasets collected and used for the data mining purpose. It is necessary that data mining technologies designed for knowledge discovery across corporations and for security purpose towards general population have sufficient privacy awareness to protect the corporate trade secrecy and individual private information. Unfortunately, most standard data mining algorithms are not very efficient in terms of privacy protection, as they were originally developed mainly for commercial applications, in which different organizations collect and own their private databases, and mine their private databases for specific commercial purposes. In the cases of inter-corporation and security data mining applications, data mining algorithms may be applied to datasets containing sensitive or private information. Data warehouse owners and government agencies may potentially have access to many databases collected from different sources and may extract any information from these databases. This potentially unlimited access to data and information raises the fear of possible abuse and promotes the call for privacy protection and due process of law. Privacy-preserving data mining techniques have been developed to address these concerns (Fung et al., 2007; Zhang, & Zhang, 2007). The general goal of the privacy-preserving data mining techniques is defined as to hide sensitive individual data values from the outside world or from unauthorized persons, and simultaneously preserve the underlying data patterns and semantics so that a valid and efficient decision model based on the distorted data can be constructed. In the best scenarios, this new decision model should be equivalent to or even better than the model using the original data from the viewpoint of decision accuracy. There are currently at least two broad classes of approaches to achieving this goal. The first class of approaches attempts to distort the original data values so that the data miners (analysts) have no means (or greatly reduced ability) to derive the original values of the data. The second is to modify the data mining algorithms so that they allow data mining operations on distributed datasets without knowing the exact values of the data or without direct accessing the original datasets. This article only discusses the first class of approaches. Interested readers may consult (Clifton et al., 2003) and the references therein for discussions on distributed data mining approaches.

Download Full-text

Comparative Study on Perturbation Techniques in Privacy Preserving Data Mining on Two Numeric Datasets

International Journal of Innovative Computing ◽

10.11113/ijic.v8n1.161 ◽

2018 ◽

Vol 8 (1) ◽

Author(s):

Desmond Ko Khang Siang ◽

Siti Hajar Othman ◽

Raja Zahilah Raja Mohd Radzi

Keyword(s):

Data Mining ◽

Data Privacy ◽

Naive Bayes ◽

Perturbation Technique ◽

Original Data ◽

Privacy Preserving ◽

Naïve Bayes ◽

Support Vector ◽

Perturbation Techniques ◽

Privacy Preserving Data Mining

Data Mining is a computational process that able to identify patterns, trends and behaviour from large datasets. With this advantages, data mining has been applied in many fields such as finance, healthcare, retail and so on. However, information disclosure become one of an issue during data mining process. Therefore, privacy protection is needed during data mining process which known as Privacy Preserving Data Mining (PPDM). There are several techniques available in PPDM and each of the techniques has its’ own benefits and drawbacks. In this research, perturbation technique is selected as privacy preserving technique. Perturbation technique is a method that alters the original data value before the application of data mining. In PPDM applications, perturbation technique able to provide a protection of data privacy but the accuracy of data should not be ignored too. In this research, three perturbation techniques are selected which are additive noise, data swapping and resample. For data mining techniques, two methods of classification are selected which are Naïve Bayes and Support Vector Machines (SVM). With the selection of these techniques, the experimental results are evaluated based on the hiding failure, accuracy and precision. For overall result, resample is selected as the best perturbation technique in naïve bayes and SVM classification for both glass and ionosphere datasets.

Download Full-text

A framework for ensemble classification and sensitivity analysis in privacy preserving data mining

International Journal of Computational Systems Engineering ◽

10.1504/ijcsyse.2019.103637 ◽

2019 ◽

Vol 5 (5/6) ◽

pp. 260-276

Author(s):

P. Chandrakanth ◽

M.S. Anbarasi

Keyword(s):

Data Mining ◽

Sensitivity Analysis ◽

Privacy Preserving ◽

Ensemble Classification ◽

Privacy Preserving Data Mining

Download Full-text

Classification and Evaluation of Privacy Preserving Data Mining Methods

2020 11th International Conference on Information and Knowledge Technology (IKT) ◽

10.1109/ikt51791.2020.9345620 ◽

2020 ◽

Author(s):

Negar Nasiri ◽

MohammadReza Keyvanpour

Keyword(s):

Data Mining ◽

Privacy Preserving ◽

Privacy Preserving Data Mining ◽

Mining Methods

Download Full-text

Analyzing and Performing Privacy Preserving Data Mining on Medical Databases

Indian Journal of Science and Technology ◽

10.17485/ijst/2016/v9i17/93024 ◽

2016 ◽

Vol 9 (17) ◽

Author(s):

D. Aruna Kumari ◽

Y. Vineela ◽

T. Mohan Krishna ◽

B. Sai Kumar

Keyword(s):

Data Mining ◽

Privacy Preserving ◽

Privacy Preserving Data Mining ◽

Medical Databases

Download Full-text

A Perturbation Method Based on Singular Value Decomposition and Feature Selection for Privacy Preserving Data Mining

International Journal of Data Warehousing and Mining ◽

10.4018/ijdwm.2014010104 ◽

2014 ◽

Vol 10 (1) ◽

pp. 55-76 ◽

Cited By ~ 1

Author(s):

Mohammad Reza Keyvanpour ◽

Somayyeh Seifi Moradi

Keyword(s):

Data Mining ◽

Feature Selection ◽

Singular Value Decomposition ◽

Perturbation Method ◽

Privacy Preserving ◽

Singular Value ◽

Privacy Preserving Data Mining ◽

Selection For ◽

Value Decomposition ◽

Different Levels

In this study, a new model is provided for customized privacy in privacy preserving data mining in which the data owners define different levels for privacy for different features. Additionally, in order to improve perturbation methods, a method combined of singular value decomposition (SVD) and feature selection methods is defined so as to benefit from the advantages of both domains. Also, to assess the amount of distortion created by the proposed perturbation method, new distortion criteria are defined in which the amount of created distortion in the process of feature selection is considered based on the value of privacy in each feature. Different tests and results analysis show that offered method based on this model compared to previous approaches, caused the improved privacy, accuracy of mining results and efficiency of privacy preserving data mining systems.

Download Full-text