Privacy-Preserving Data Mining

Author(s):  
Alexandre Evfimievski ◽  
Tyrone Grandison

Privacy-preserving data mining (PPDM) refers to the area of data mining that seeks to safeguard sensitive information from unsolicited or unsanctioned disclosure. Most traditional data mining techniques analyze and model the data set statistically, in aggregated form, while privacy preservation is primarily concerned with protecting against disclosure of individual data records. This domain separation points to the technical feasibility of PPDM. Historically, issues related to PPDM were first studied by the national statistical agencies interested in collecting private social and economical data, such as census and tax records, and making it available for analysis by public servants, companies, and researchers. Building accurate socioeconomical models is vital for business planning and public policy. Yet, there is no way of knowing in advance what models may be needed, nor is it feasible for the statistical agency to perform all data processing for everyone, playing the role of a trusted third party. Instead, the agency provides the data in a sanitized form that allows statistical processing and protects the privacy of individual records, solving a problem known as privacypreserving data publishing. For a survey of work in statistical databases, see Adam and Wortmann (1989) and Willenborg and de Waal (2001).

In data mining Privacy Preserving Data mining (PPDM) of the important research areas concentrated in recent years which ensures ensuring sensitive information and rule not being revealed. Several methods and techniques were proposed to hide sensitive information and rule in databases. In the past, perturbation-based PPDM was developed to preserve privacy before use and secure mining of association rules were performed in horizontally distributed databases. This paper presents an integrated model for solving the multi-objective factors, data and rule hiding through reinforcement and discrete optimization for data publishing. This is denoted as an integrated Reinforced Social Ant and Discrete Swarm Optimization (RSADSO) model. In RSA-DSO model, both Reinforced Social Ant and Discrete Swarm Optimization perform with the same particles. To start with, sensitive data item hiding is performed through Reinforced Social Ant model. Followed by this performance, sensitive rules are identified and further hidden for data publishing using Discrete Swarm Optimization model. In order to evaluate the RSA-DSO model, it was tested on benchmark dataset. The results show that RSA-DSO model is more efficient in improving the privacy preservation accuracy with minimal time for optimal hiding and also optimizing the generation of sensitive rules.


2008 ◽  
pp. 2379-2401 ◽  
Author(s):  
Igor Nai Fovino

Intense work in the area of data mining technology and in its applications to several domains has resulted in the development of a large variety of techniques and tools able to automatically and intelligently transform large amounts of data in knowledge relevant to users. However, as with other kinds of useful technologies, the knowledge discovery process can be misused. It can be used, for example, by malicious subjects in order to reconstruct sensitive information for which they do not have an explicit access authorization. This type of “attack” cannot easily be detected, because, usually, the data used to guess the protected information, is freely accessible. For this reason, many research efforts have been recently devoted to addressing the problem of privacy preserving in data mining. The mission of this chapter is therefore to introduce the reader in this new research field and to provide the proper instruments (in term of concepts, techniques and example) in order to allow a critical comprehension of the advantages, the limitations and the open issues of the Privacy Preserving Data Mining Techniques.


2016 ◽  
Vol 7 (3) ◽  
pp. 1-9 ◽  
Author(s):  
Sahar A. El-Rahman Ismail ◽  
Dalal Al Makhdhub ◽  
Amal A. Al Qahtani ◽  
Ghadah A. Al Shabanat ◽  
Nouf M. Omair ◽  
...  

We live in an information era where sensitive information extracted from data mining systems is vulnerable to exploitation. Privacy preserving data mining aims to prevent the discovery of sensitive information. Information hiding systems provide excellent privacy and confidentiality, where securing confidential communications in public channels can be achieved using steganography. A cover media are exploited using steganography techniques where they hide the payload's existence within appropriate multimedia carriers. This paper aims to study steganography techniques in spatial and frequency domains, and then analyzes the performance of Discrete Cosine Transform (DCT) based steganography using the low frequency and the middle frequency to compare their performance using Peak Signal to Noise Ratio (PSNR) and Mean Square Error (MSE). The experimental results show that middle frequency has the larger message capacity and best performance.


Author(s):  
Igor Nai Fovino

Intense work in the area of data mining technology and in its applications to several domains has resulted in the development of a large variety of techniques and tools able to automatically and intelligently transform large amounts of data in knowledge relevant to users. However, as with other kinds of useful technologies, the knowledge discovery process can be misused. It can be used, for example, by malicious subjects in order to reconstruct sensitive information for which they do not have an explicit access authorization. This type of “attack” cannot easily be detected, because, usually, the data used to guess the protected information, is freely accessible. For this reason, many research efforts have been recently devoted to addressing the problem of privacy preserving in data mining. The mission of this chapter is therefore to introduce the reader in this new research field and to provide the proper instruments (in term of concepts, techniques and example) in order to allow a critical comprehension of the advantages, the limitations and the open issues of the Privacy Preserving Data Mining Techniques.


Author(s):  
G. Bhavani ◽  
S. Sivakumari

Data mining process extracts useful information from a large amount of data. The most interesting part of data mining is discovering the unseen patterns without unpacking sensitive knowledge. Privacy Preserving Data Mining abbreviated as PPDM deals with the issue of sustaining the privacy of information. This methodology covers the sensitive information from disclosure. PPDM techniques are established for hiding the sensitive information even after performing the data mining. One of the practices to hide the sensitive association rules is termed as association rule hiding. The main objective of association rule hiding algorithm is to slightly adjust the original database so that no sensitive association rule is derived from it. The following article presents a detailed survey of various association rule hiding techniques for preserving privacy in data mining. At first, different techniques developed by previous researchers are studied in detail. Then, a comparative analysis is carried out to know the limitations of each technique and then providing a suggestion for future improvement in association rule hiding for privacy preservation.


2014 ◽  
Vol 2014 ◽  
pp. 1-12 ◽  
Author(s):  
Chun-Wei Lin ◽  
Tzung-Pei Hong ◽  
Hung-Chuan Hsu

Data mining is traditionally adopted to retrieve and analyze knowledge from large amounts of data. Private or confidential data may be sanitized or suppressed before it is shared or published in public. Privacy preserving data mining (PPDM) has thus become an important issue in recent years. The most general way of PPDM is to sanitize the database to hide the sensitive information. In this paper, a novel hiding-missing-artificial utility (HMAU) algorithm is proposed to hide sensitive itemsets through transaction deletion. The transaction with the maximal ratio of sensitive to nonsensitive one is thus selected to be entirely deleted. Three side effects of hiding failures, missing itemsets, and artificial itemsets are considered to evaluate whether the transactions are required to be deleted for hiding sensitive itemsets. Three weights are also assigned as the importance to three factors, which can be set according to the requirement of users. Experiments are then conducted to show the performance of the proposed algorithm in execution time, number of deleted transactions, and number of side effects.


2021 ◽  
Vol 10 (2) ◽  
pp. 78
Author(s):  
Songyuan Li ◽  
Hui Tian ◽  
Hong Shen ◽  
Yingpeng Sang

Publication of trajectory data that contain rich information of vehicles in the dimensions of time and space (location) enables online monitoring and supervision of vehicles in motion and offline traffic analysis for various management tasks. However, it also provides security holes for privacy breaches as exposing individual’s privacy information to public may results in attacks threatening individual’s safety. Therefore, increased attention has been made recently on the privacy protection of trajectory data publishing. However, existing methods, such as generalization via anonymization and suppression via randomization, achieve protection by modifying the original trajectory to form a publishable trajectory, which results in significant data distortion and hence a low data utility. In this work, we propose a trajectory privacy-preserving method called dynamic anonymization with bounded distortion. In our method, individual trajectories in the original trajectory set are mixed in a localized manner to form synthetic trajectory data set with a bounded distortion for publishing, which can protect the privacy of location information associated with individuals in the trajectory data set and ensure a guaranteed utility of the published data both individually and collectively. Through experiments conducted on real trajectory data of Guangzhou City Taxi statistics, we evaluate the performance of our proposed method and compare it with the existing mainstream methods in terms of privacy preservation against attacks and trajectory data utilization. The results show that our proposed method achieves better performance on data utilization than the existing methods using globally static anonymization, without trading off the data security against attacks.


2010 ◽  
Vol 45 (1) ◽  
pp. 151-159 ◽  
Author(s):  
Michal Sramka

ABSTRACTMany databases contain data about individuals that are valuable for research, marketing, and decision making. Sharing or publishing data about individuals is however prone to privacy attacks, breaches, and disclosures. The concern here is about individuals’ privacy-keeping the sensitive information about individuals private to them. Data mining in this setting has been shown to be a powerful tool to breach privacy and make disclosures. In contrast, data mining can be also used in practice to aid data owners in their decision on how to share and publish their databases. We present and discuss the role and uses of data mining in these scenarios and also briefly discuss other approaches to private data analysis.


2021 ◽  
Vol 9 (2) ◽  
pp. 131-135
Author(s):  
G. Srinivas Reddy, Et. al.

As the usage of internet and web applications emerges faster, security and privacy of the data is the most challenging issue which we are facing, leading to the possibility of being easily damaged. Various conventional techniques are used for privacy preservation like condensation, randomization and tree structure etc., the limitations of the existing approaches are, they are not able to maintain proper balance between the data utility and privacy and it may have the problem with privacy violations. This paper presents an Additive Rotation Perturbation approach for Privacy Preserving Data Mining (PPDM). In this proposed work, various dataset from UCI Machine Learning Repository was collected and it is protected with a New Additive Rotational Perturbation Technique under Privacy Preserving Data Mining. Experimental result shows that the proposed algorithm’s strength is high for all the datasets and it is estimated using the DoV (Difference of Variance) method.


Sign in / Sign up

Export Citation Format

Share Document