Privacy Preserving Data Mining, Concepts, Techniques, and Evaluation Methodologies

Author(s):

Igor Nai Fovino

Keyword(s):

Data Mining ◽

Knowledge Discovery ◽

Privacy Preserving ◽

Research Field ◽

Sensitive Information ◽

Discovery Process ◽

Mining Technology ◽

Evaluation Methodologies ◽

New Research

Intense work in the area of data mining technology and in its applications to several domains has resulted in the development of a large variety of techniques and tools able to automatically and intelligently transform large amounts of data in knowledge relevant to users. However, as with other kinds of useful technologies, the knowledge discovery process can be misused. It can be used, for example, by malicious subjects in order to reconstruct sensitive information for which they do not have an explicit access authorization. This type of “attack” cannot easily be detected, because, usually, the data used to guess the protected information, is freely accessible. For this reason, many research efforts have been recently devoted to addressing the problem of privacy preserving in data mining. The mission of this chapter is therefore to introduce the reader in this new research field and to provide the proper instruments (in term of concepts, techniques and example) in order to allow a critical comprehension of the advantages, the limitations and the open issues of the Privacy Preserving Data Mining Techniques.

DCT Image Steganography Analysis for Privacy Preserving Data Mining

International Journal of Technology Diffusion ◽

10.4018/ijtd.2016070101 ◽

2016 ◽

Vol 7 (3) ◽

pp. 1-9 ◽

Author(s):

Sahar A. El-Rahman Ismail ◽

Dalal Al Makhdhub ◽

Amal A. Al Qahtani ◽

Ghadah A. Al Shabanat ◽

Nouf M. Omair ◽

...

Keyword(s):

Data Mining ◽

Information Hiding ◽

Signal To Noise Ratio ◽

Low Frequency ◽

Privacy Preserving ◽

Sensitive Information ◽

Signal To Noise ◽

Frequency Domains ◽

Privacy And Confidentiality

We live in an information era where sensitive information extracted from data mining systems is vulnerable to exploitation. Privacy preserving data mining aims to prevent the discovery of sensitive information. Information hiding systems provide excellent privacy and confidentiality, where securing confidential communications in public channels can be achieved using steganography. A cover media are exploited using steganography techniques where they hide the payload's existence within appropriate multimedia carriers. This paper aims to study steganography techniques in spatial and frequency domains, and then analyzes the performance of Discrete Cosine Transform (DCT) based steganography using the low frequency and the middle frequency to compare their performance using Peak Signal to Noise Ratio (PSNR) and Mean Square Error (MSE). The experimental results show that middle frequency has the larger message capacity and best performance.

Comprehensive Survey on Privacy Preserving Association Rule Mining: Models, Approaches, Techniques and Algorithms

International Journal of Artificial Intelligence Tools ◽

10.1142/s0218213014500043 ◽

2014 ◽

Vol 23 (05) ◽

pp. 1450004 ◽

Cited By ~ 5

Author(s):

Ibrahim S. Alwatban ◽

Ahmed Z. Emam

Keyword(s):

Data Mining ◽

Association Rules ◽

Association Rule ◽

Association Rule Mining ◽

Research Area ◽

Privacy Preserving ◽

Rule Mining ◽

Comprehensive Survey ◽

New Research

In recent years, a new research area known as privacy preserving data mining (PPDM) has emerged and captured the attention of many researchers interested in preventing the privacy violations that may occur during data mining. In this paper, we provide a review of studies on PPDM in the context of association rules (PPARM). This paper systematically defines the scope of this survey and determines the PPARM models. The problems of each model are formally described, and we discuss the relevant approaches, techniques and algorithms that have been proposed in the literature. A profile of each model and the accompanying algorithms are provided with a comparison of the PPARM models.

Bit Transformation Perturbative Masking Technique for Protecting Sensitive Information In Privacy Preserving Data Mining

International Journal of Database Management Systems ◽

10.5121/ijdms.2010.2409 ◽

2010 ◽

Vol 2 (4) ◽

pp. 107-114 ◽

Author(s):

S Vijayarani ◽

A Tamilarasi

Keyword(s):

Data Mining ◽

Privacy Preserving ◽

Sensitive Information ◽

Privacy Preserving Data Mining

Reducing Side Effects of Hiding Sensitive Itemsets in Privacy Preserving Data Mining

The Scientific World JOURNAL ◽

10.1155/2014/235837 ◽

2014 ◽

Vol 2014 ◽

pp. 1-12 ◽

Cited By ~ 10

Author(s):

Chun-Wei Lin ◽

Tzung-Pei Hong ◽

Hung-Chuan Hsu

Keyword(s):

Data Mining ◽

Side Effects ◽

Execution Time ◽

Privacy Preserving ◽

Sensitive Information ◽

Confidential Data

Data mining is traditionally adopted to retrieve and analyze knowledge from large amounts of data. Private or confidential data may be sanitized or suppressed before it is shared or published in public. Privacy preserving data mining (PPDM) has thus become an important issue in recent years. The most general way of PPDM is to sanitize the database to hide the sensitive information. In this paper, a novel hiding-missing-artificial utility (HMAU) algorithm is proposed to hide sensitive itemsets through transaction deletion. The transaction with the maximal ratio of sensitive to nonsensitive one is thus selected to be entirely deleted. Three side effects of hiding failures, missing itemsets, and artificial itemsets are considered to evaluate whether the transactions are required to be deleted for hiding sensitive itemsets. Three weights are also assigned as the importance to three factors, which can be set according to the requirement of users. Experiments are then conducted to show the performance of the proposed algorithm in execution time, number of deleted transactions, and number of side effects.

Collusion-Free Privacy Preserving Data Mining

International Journal of Intelligent Information Technologies ◽

10.4018/jiit.2010100103 ◽

2010 ◽

Vol 6 (4) ◽

pp. 30-45 ◽

Cited By ~ 7

Author(s):

M. Rajalakshmi ◽

T. Purusothaman ◽

S. Pratheeba

Keyword(s):

Data Mining ◽

Association Rule ◽

Privacy Preserving ◽

Frequent Itemsets ◽

Data Sources ◽

Sensitive Information ◽

Distributed Data ◽

Distributed Environment ◽

Rule Mining ◽

Privacy Preserving Data Mining

Distributed association rule mining is an integral part of data mining that extracts useful information hidden in distributed data sources. As local frequent itemsets are globalized from data sources, sensitive information about individual data sources needs high protection. Different privacy preserving data mining approaches for distributed environment have been proposed but in the existing approaches, collusion among the participating sites reveal sensitive information about the other sites. In this paper, the authors propose a collusion-free algorithm for mining global frequent itemsets in a distributed environment with minimal communication among sites. This algorithm uses the techniques of splitting and sanitizing the itemsets and communicates to random sites in two different phases, thus making it difficult for the colluders to retrieve sensitive information. Results show that the consequence of collusion is reduced to a greater extent without affecting mining performance and confirms optimal communication among sites.

Collusion-Free Privacy Preserving Data Mining

Insights into Advancements in Intelligent Information Technologies ◽

10.4018/978-1-4666-0158-1.ch015 ◽

2012 ◽

pp. 269-284

Author(s):

T. Purusothaman ◽

M. Rajalakshmi ◽

S. Pratheeba

Keyword(s):

Data Mining ◽

Privacy Preserving ◽

Frequent Itemsets ◽

Data Sources ◽

Sensitive Information ◽

Distributed Data ◽

Distributed Environment ◽

Rule Mining ◽

Distributed Association

Distributed association rule mining is an integral part of data mining that extracts useful information hidden in distributed data sources. As local frequent itemsets are globalized from data sources, sensitive information about individual data sources needs high protection. Different privacy preserving data mining approaches for distributed environment have been proposed but in the existing approaches, collusion among the participating sites reveal sensitive information about the other sites. In this paper, the authors propose a collusion-free algorithm for mining global frequent itemsets in a distributed environment with minimal communication among sites. This algorithm uses the techniques of splitting and sanitizing the itemsets and communicates to random sites in two different phases, thus making it difficult for the colluders to retrieve sensitive information. Results show that the consequence of collusion is reduced to a greater extent without affecting mining performance and confirms optimal communication among sites.

Modeling the KDD Process

Encyclopedia of Data Warehousing and Mining, Second Edition ◽

10.4018/978-1-60566-010-3.ch207 ◽

2011 ◽

pp. 1337-1345 ◽

Author(s):

Vasudha Bhatnagar ◽

S. K. Gupta

Keyword(s):

Data Mining ◽

Knowledge Discovery ◽

Continuous Process ◽

Large Data ◽

Research Field ◽

Knowledge Discovery In Databases ◽

Data Repositories ◽

Domain Experts ◽

Large Databases ◽

New Research

Knowledge Discovery in Databases (KDD) is classically defined as the “nontrivial process of identifying valid, novel, potentially useful, and ultimately understandable patterns in large databases” ( Fayyad, Piatetsky-Shapiro & Smyth, 1996a). The recently developed KDD technology is based on a well-defined, multi-step “KDD process” for discovering knowledge from large data repositories. The basic problem addressed by the KDD process is one of mapping lowlevel data (operational in nature and too voluminous) to a more abstract form (descriptive approximation or model of the process that generated the data) or a useful form (for example, a predictive model) (Fayyad, Piatetsky-Shapiro & Smyth, 1996b). The KDD process evolves with pro-active intervention of the domain experts, data mining analyst and the end-users. It is a ‘continuous’ process in the sense that the results of the process may fuel new motivations for further discoveries (Chapman et al., 2000). Modeling and planning of the KDD process has been recognized as a new research field (John, 2000). In this chapter we provide an introduction to the process of knowledge discovery in databases (KDD process), and present some models (conceptual as well as practical) to carry out the KDD endeavor.

The Study of Privacy Preserving Data Mining Technology for Information Security

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.556-562.3532 ◽

2014 ◽

Vol 556-562 ◽

pp. 3532-3535

Author(s):

Heng Li ◽

Xue Fang Wu

Keyword(s):

Data Mining ◽

Data Privacy ◽

Privacy Preservation ◽

Rapid Development ◽

Privacy Preserving ◽

Future Research ◽

Mining Technology ◽

Network Database ◽

Use Of Data

With the rapid development of computer technology and the popularity of the network, database scale, scope and depth of the constantly expanding, which has accumulated vast amounts of different forms of stored data. The use of data mining technology can access valuable information from a lot of data. Privacy preserving has been one of the greater concerns in data mining. Privacy preserving data mining has a rapid development in a short year. But it still faces many challenges in the future. A number of methods and techniques have been developed for privacy preserving data mining. This paper analyzed the representative techniques for privacy preservation. Finally the present problems and directions for future research are discussed.

Association Rule Hiding in Privacy Preserving Data Mining

Research Anthology on Privatizing and Securing Data ◽

10.4018/978-1-7998-8954-0.ch044 ◽

2021 ◽

pp. 963-986

Author(s):

S. Vijayarani Mohan ◽

Tamilarasi Angamuthu

Keyword(s):

Data Mining ◽

Genetic Algorithm ◽

Association Rules ◽

Association Rule ◽

Privacy Preserving ◽

Sensitive Information ◽

Hidden Information ◽

Marketing Information ◽

Mining Association Rule

This article describes how privacy preserving data mining has become one of the most important and interesting research directions in data mining. With the help of data mining techniques, people can extract hidden information and discover patterns and relationships between the data items. In most of the situations, the extracted knowledge contains sensitive information about individuals and organizations. Moreover, this sensitive information can be misused for various purposes which violate the individual's privacy. Association rules frequently predetermine significant target marketing information about a business. Significant association rules provide knowledge to the data miner as they effectively summarize the data, while uncovering any hidden relations among items that hold in the data. Association rule hiding techniques are used for protecting the knowledge extracted by the sensitive association rules during the process of association rule mining. Association rule hiding refers to the process of modifying the original database in such a way that certain sensitive association rules disappear without seriously affecting the data and the non-sensitive rules. In this article, two new hiding techniques are proposed namely hiding technique based on genetic algorithm (HGA) and dummy items creation (DIC) technique. Hiding technique based on genetic algorithm is used for hiding sensitive association rules and the dummy items creation technique hides the sensitive rules as well as it creates dummy items for the modified sensitive items. Experimental results show the performance of the proposed techniques.