Efficient Two-Step Protocol and Its Discriminative Feature Selections in Secure Similar Document Detection

Recently, the risk of information disclosure is increasing significantly. Accordingly, privacy-preserving data mining (PPDM) is being actively studied to obtain accurate mining results while preserving the data privacy. We here focus on secure similar document detection (SSDD), which identifies similar documents of two parties when each party does not disclose its own sensitive documents to the another party. In this paper, we propose an efficient two-step protocol that exploits a feature selection as a lower-dimensional transformation, and we present discriminative feature selections to maximize the performance of the protocol. The proposed protocol consists of two steps: thefilteringstep and thepostprocessingstep. For the feature selection, we first consider the simplest one, random projection (RP), and propose its two-step solution,SSDD-RP. We then present two discriminative feature selections and their solutions:SSDD-LFwhich selects a few dimensions locally frequent in the current querying vector andSSDD-GFwhich selects ones globally frequent in the set of all document vectors. We finally propose a hybrid one,SSDD-HF, which takes advantage of bothSSDD-LFandSSDD-GF. We empirically show that the proposed two-step protocol significantly outperforms the previous one-step protocol by three or four orders of magnitude.

Download Full-text

A Perturbation Method Based on Singular Value Decomposition and Feature Selection for Privacy Preserving Data Mining

International Journal of Data Warehousing and Mining ◽

10.4018/ijdwm.2014010104 ◽

2014 ◽

Vol 10 (1) ◽

pp. 55-76 ◽

Cited By ~ 1

Author(s):

Mohammad Reza Keyvanpour ◽

Somayyeh Seifi Moradi

Keyword(s):

Data Mining ◽

Feature Selection ◽

Singular Value Decomposition ◽

Perturbation Method ◽

Privacy Preserving ◽

Singular Value ◽

Privacy Preserving Data Mining ◽

Selection For ◽

Value Decomposition ◽

Different Levels

In this study, a new model is provided for customized privacy in privacy preserving data mining in which the data owners define different levels for privacy for different features. Additionally, in order to improve perturbation methods, a method combined of singular value decomposition (SVD) and feature selection methods is defined so as to benefit from the advantages of both domains. Also, to assess the amount of distortion created by the proposed perturbation method, new distortion criteria are defined in which the amount of created distortion in the process of feature selection is considered based on the value of privacy in each feature. Different tests and results analysis show that offered method based on this model compared to previous approaches, caused the improved privacy, accuracy of mining results and efficiency of privacy preserving data mining systems.

Download Full-text

A Perturbation Method Based on Singular Value Decomposition and Feature Selection for Privacy Preserving Data Mining

Business Intelligence ◽

10.4018/978-1-4666-9562-7.ch015 ◽

2016 ◽

pp. 281-304

Author(s):

Mohammad Reza Keyvanpour ◽

Somayyeh Seifi Moradi

Keyword(s):

Data Mining ◽

Feature Selection ◽

Singular Value Decomposition ◽

Perturbation Method ◽

Privacy Preserving ◽

Singular Value ◽

Privacy Preserving Data Mining ◽

New Model ◽

Value Decomposition ◽

Different Levels

Download Full-text

Data privacy in construction industry by privacy-preserving data mining (PPDM) approach

Asian Journal of Civil Engineering ◽

10.1007/s42107-020-00225-3 ◽

2020 ◽

Vol 21 (3) ◽

pp. 505-515

Author(s):

Tirth Patel ◽

Vejal Patel

Keyword(s):

Data Mining ◽

Construction Industry ◽

Data Privacy ◽

Privacy Preserving ◽

Privacy Preserving Data Mining

Download Full-text

Study on distributed privacy preserving data mining

World Journal of Engineering ◽

10.1260/1708-5284.11.2.163 ◽

2014 ◽

Vol 11 (2) ◽

pp. 163-170

Author(s):

Binli Wang ◽

Yanguang Shen

Keyword(s):

Data Mining ◽

Data Privacy ◽

Rapid Development ◽

Privacy Preserving ◽

Future Research ◽

Distributed Data ◽

Distributed Environment ◽

Privacy Preserving Data Mining ◽

Advantages And Disadvantages ◽

Future Research Directions

Recently, with the rapid development of network, communications and computer technology, privacy preserving data mining (PPDM) has become an increasingly important research in the field of data mining. In distributed environment, how to protect data privacy while doing data mining jobs from a large number of distributed data is more far-researching. This paper describes current research of PPDM at home and abroad. Then it puts emphasis on classifying the typical uses and algorithms of PPDM in distributed environment, and summarizing their advantages and disadvantages. Furthermore, it points out the future research directions in the field.

Download Full-text

Privacy Preserving with Association Rule Mining using Evolutionary Algorithm

International Journal of Recent Technology and Engineering - 2 ◽

10.35940/ijrte.d9701.118419 ◽

2019 ◽

Vol 8 (4) ◽

pp. 11893-11899

Keyword(s):

Data Mining ◽

Optimization Algorithm ◽

Association Rule ◽

Association Rule Mining ◽

Data Privacy ◽

Privacy Preserving ◽

Rule Mining ◽

Privacy Preserving Data Mining ◽

Multiple Data ◽

Modified Algorithm

Privacy-Preserving-Data-Mining (PPDM) is a novel study which goals to protect the secretive evidence also circumvent the revelation of the evidence through the records reproducing progression. This paper focused on the privacy preserving on vertical separated databases. The designed methodology for the subcontracted databases allows multiple data viewers besides vendors proficiently to their records securely without conceding the secrecy of the data. Privacy Preserving Association Rule-Mining (PPARM) is one method, which objects to pelt sensitivity of the association imperative. A new efficient approach lives the benefit since the strange optimizations algorithms for the delicate association rule hiding. It is required to get leak less information of the raw data. The evaluation of the efficient of the proposed method can be conducting on some experiments on different databases. Based on the above optimization algorithm, the modified algorithm is to optimize the association rules on vertically and horizontally separated database and studied their performance

Download Full-text

Privacy Preserving Data Mining

10.5772/intechopen.99224 ◽

2021 ◽

Author(s):

Esma Ergüner Özkoç

Keyword(s):

Data Mining ◽

Data Privacy ◽

Personal Data ◽

Privacy Preserving ◽

Privacy Preserving Data Mining ◽

Data Mining Techniques ◽

Data Mining Algorithms ◽

Data Output ◽

The Individual ◽

Mining Algorithms

Data mining techniques provide benefits in many areas such as medicine, sports, marketing, signal processing as well as data and network security. However, although data mining techniques used in security subjects such as intrusion detection, biometric authentication, fraud and malware classification, “privacy” has become a serious problem, especially in data mining applications that involve the collection and sharing of personal data. For these reasons, the problem of protecting privacy in the context of data mining differs from traditional data privacy protection, as data mining can act as both a friend and foe. Chapter covers the previously developed privacy preserving data mining techniques in two parts: (i) techniques proposed for input data that will be subject to data mining and (ii) techniques suggested for processed data (output of the data mining algorithms). Also presents attacks against the privacy of data mining applications. The chapter conclude with a discussion of next-generation privacy-preserving data mining applications at both the individual and organizational levels.

Download Full-text

The Study of Privacy Preserving Data Mining Technology for Information Security

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.556-562.3532 ◽

2014 ◽

Vol 556-562 ◽

pp. 3532-3535

Author(s):

Heng Li ◽

Xue Fang Wu

Keyword(s):

Data Mining ◽

Data Privacy ◽

Privacy Preservation ◽

Rapid Development ◽

Privacy Preserving ◽

Future Research ◽

Privacy Preserving Data Mining ◽

Mining Technology ◽

Network Database ◽

Use Of Data

With the rapid development of computer technology and the popularity of the network, database scale, scope and depth of the constantly expanding, which has accumulated vast amounts of different forms of stored data. The use of data mining technology can access valuable information from a lot of data. Privacy preserving has been one of the greater concerns in data mining. Privacy preserving data mining has a rapid development in a short year. But it still faces many challenges in the future. A number of methods and techniques have been developed for privacy preserving data mining. This paper analyzed the representative techniques for privacy preservation. Finally the present problems and directions for future research are discussed.

Download Full-text

A Novel Privacy Preserving Data mining using improved decision tree and KP-ABE on High Dimensional Data

International Journal of Engineering & Technology ◽

10.14419/ijet.v7i2.7.10874 ◽

2018 ◽

Vol 7 (2.7) ◽

pp. 515

Author(s):

Aaluri Seenu ◽

M Kameswara Rao

Keyword(s):

Data Mining ◽

Decision Tree ◽

Data Privacy ◽

Privacy Preserving ◽

Classification Model ◽

Distributed Data Mining ◽

High Dimensional ◽

Distributed Data ◽

Privacy Preserving Data Mining ◽

Tree Classifier

In distributed data mining environment maintaining individual data or patterns is a major issue due to high dimensionality and data size. Distributed Data mining framework can help to find the essential decision making patterns from distributed data. Privacy preserving data mining (PPDM) has emerged as a main research area for data confidentiality and knowledge sharing in between the communicating parties. As the distributed data of the individuals are stored by the third party, it leads to the misuse of distributed information in digital networks. Most of the decision patterns generated using the machine learning models for business organizations, industries and individuals has to be encoded before it is publicly shared or published. As the amount of data collected from different sources are increasing exponentially, the time taken to preserve the patterns using the traditional privacy preserving data mining models also increasing due to high computational attribute selection measures and noise in the distributed data. Also, filling sparse values using the conventional models are inefficient and infeasible for privacy preserving models. In this paper, a novel privacy preserving based classification model was designed and implemented on large datasets. In this model, a filter-based privacy preserving model using improved decision tree classifier is implemented to preserve the decision patterns using IPPDM-KPABE model. Experimental results proved that the proposed model has high computational efficiency compared to the traditional privacy preserving model on high dimensional datasets.

Download Full-text