Privacy Preserving Classification of Biomedical Data With Secure Removing of Duplicate Records

Classifying data is to automatically assign predefined classes to data. It is one of the main applications of data mining. Having complete access to all data is critical for building accurate models. Data can be highly sensitive, such as biomedical data, which cannot be disclosed or shared with third party, because it can harm individuals and organizations. The challenge is how to preserve privacy and usefulness of data. Privacy preserving classification addresses this problem. Collaborative models are constructed over networks without violating the data owners' privacy. In this article, the authors address two problems: privacy records deduplication of the same records and privacy-preserving classification. They propose a randomized hash technic for deduplication and an enhanced privacy preserving classification of biomedical data over horizontally distributed data based on two homomorphic encryptions. No private, intermediate or final results are disclosed. Experimentations show that their solution is efficient and secure without loss of accuracy.

Study on distributed privacy preserving data mining

World Journal of Engineering ◽

10.1260/1708-5284.11.2.163 ◽

2014 ◽

Vol 11 (2) ◽

pp. 163-170

Author(s):

Binli Wang ◽

Yanguang Shen

Keyword(s):

Data Mining ◽

Data Privacy ◽

Rapid Development ◽

Privacy Preserving ◽

Future Research ◽

Distributed Data ◽

Distributed Environment ◽

Privacy Preserving Data Mining ◽

Advantages And Disadvantages ◽

Future Research Directions

Recently, with the rapid development of network, communications and computer technology, privacy preserving data mining (PPDM) has become an increasingly important research in the field of data mining. In distributed environment, how to protect data privacy while doing data mining jobs from a large number of distributed data is more far-researching. This paper describes current research of PPDM at home and abroad. Then it puts emphasis on classifying the typical uses and algorithms of PPDM in distributed environment, and summarizing their advantages and disadvantages. Furthermore, it points out the future research directions in the field.

A Novel Privacy Preserving Data mining using improved decision tree and KP-ABE on High Dimensional Data

International Journal of Engineering & Technology ◽

10.14419/ijet.v7i2.7.10874 ◽

2018 ◽

Vol 7 (2.7) ◽

pp. 515

Author(s):

Aaluri Seenu ◽

M Kameswara Rao

Keyword(s):

Data Mining ◽

Decision Tree ◽

Data Privacy ◽

Privacy Preserving ◽

Classification Model ◽

Distributed Data Mining ◽

High Dimensional ◽

Distributed Data ◽

Privacy Preserving Data Mining ◽

Tree Classifier

In distributed data mining environment maintaining individual data or patterns is a major issue due to high dimensionality and data size. Distributed Data mining framework can help to find the essential decision making patterns from distributed data. Privacy preserving data mining (PPDM) has emerged as a main research area for data confidentiality and knowledge sharing in between the communicating parties. As the distributed data of the individuals are stored by the third party, it leads to the misuse of distributed information in digital networks. Most of the decision patterns generated using the machine learning models for business organizations, industries and individuals has to be encoded before it is publicly shared or published. As the amount of data collected from different sources are increasing exponentially, the time taken to preserve the patterns using the traditional privacy preserving data mining models also increasing due to high computational attribute selection measures and noise in the distributed data. Also, filling sparse values using the conventional models are inefficient and infeasible for privacy preserving models. In this paper, a novel privacy preserving based classification model was designed and implemented on large datasets. In this model, a filter-based privacy preserving model using improved decision tree classifier is implemented to preserve the decision patterns using IPPDM-KPABE model. Experimental results proved that the proposed model has high computational efficiency compared to the traditional privacy preserving model on high dimensional datasets.

Classification of Privacy-preserving Distributed Data Mining protocols

2011 Sixth International Conference on Digital Information Management ◽

10.1109/icdim.2011.6093356 ◽

2011 ◽

Cited By ~ 11

Author(s):

Zhuojia Xu ◽

Xun Yi

Keyword(s):

Data Mining ◽

Privacy Preserving ◽

Distributed Data Mining ◽

Distributed Data

Misusability Measure Based Sanitization of Big Data for Privacy Preserving MapReduce Programming

International Journal of Electrical and Computer Engineering (IJECE) ◽

10.11591/ijece.v8i6.pp4524-4532 ◽

2018 ◽

Vol 8 (6) ◽

pp. 4524

Author(s):

D. Radhika ◽

D. Aruna Kumari

Keyword(s):

Data Mining ◽

Big Data ◽

Data Privacy ◽

Hybrid Approach ◽

Privacy Preserving ◽

Data Publishing ◽

Distributed Data Mining ◽

Distributed Data ◽

Public Cloud ◽

Sensitive Data

Leakage and misuse of sensitive data is a challenging problem to enterprises. It has become more serious problem with the advent of cloud and big data. The rationale behind this is the increase in outsourcing of data to public cloud and publishing data for wider visibility. Therefore Privacy Preserving Data Publishing (PPDP), Privacy Preserving Data Mining (PPDM) and Privacy Preserving Distributed Data Mining (PPDM) are crucial in the contemporary era. PPDP and PPDM can protect privacy at data and process levels respectively. Therefore, with big data privacy to data became indispensable due to the fact that data is stored and processed in semi-trusted environment. In this paper we proposed a comprehensive methodology for effective sanitization of data based on misusability measure for preserving privacy to get rid of data leakage and misuse. We followed a hybrid approach that caters to the needs of privacy preserving MapReduce programming. We proposed an algorithm known as Misusability Measure-Based Privacy serving Algorithm (MMPP) which considers level of misusability prior to choosing and application of appropriate sanitization on big data. Our empirical study with Amazon EC2 and EMR revealed that the proposed methodology is useful in realizing privacy preserving Map Reduce programming.

Performance analysis of privacy preserving distributed data mining based on cryptographic techniques

2021 7th International Conference on Electrical Energy Systems (ICEES) ◽

10.1109/icees51510.2021.9383673 ◽

2021 ◽

Author(s):

Venkatesh Kumar Marimuthu ◽

C. Lakshmi

Keyword(s):

Data Mining ◽

Performance Analysis ◽

Privacy Preserving ◽

Distributed Data Mining ◽

Distributed Data ◽

Cryptographic Techniques

Towards Distributed Association Rule Mining Privacy

Application of Agents and Intelligent Information Technologies - Advances in Intelligent Information Technologies ◽

10.4018/978-1-59904-265-7.ch011 ◽

2011 ◽

pp. 245-271

Author(s):

Mafruz Ashrafi ◽

David Taniar ◽

Kate Smith

Keyword(s):

Data Mining ◽

Data Privacy ◽

Large Data ◽

Digital Data ◽

Sensitive Information ◽

Distributed Data ◽

Data Repositories ◽

Actionable Knowledge ◽

The Cost ◽

Network Technologies

With the advancement of storage, retrieval, and network technologies today, the amount of information available to each organization is literally exploding. Although it is widely recognized that the value of data as an organizational asset often becomes a liability because of the cost to acquire and manage those data is far more than the value that is derived from it. Thus, the success of modern organizations not only relies on their capability to acquire and manage their data but their efficiency to derive useful actionable knowledge from it. To explore and analyze large data repositories and discover useful actionable knowledge from them, modern organizations have used a technique known as data mining, which analyzes voluminous digital data and discovers hidden but useful patterns from such massive digital data. However, discovery of hidden patterns has statistical meaning and may often disclose some sensitive information. As a result, privacy becomes one of the prime concerns in the data-mining research community. Since distributed data mining discovers rules by combining local models from various distributed sites, breaching data privacy happens more often than it does in centralized environments.

Privacy-Preserving Process Mining in Healthcare

International Journal of Environmental Research and Public Health ◽

10.3390/ijerph17051612 ◽

2020 ◽

Vol 17 (5) ◽

pp. 1612 ◽

Cited By ~ 4

Author(s):

Anastasiia Pika ◽

Moe T. Wynn ◽

Stephanus Budiono ◽

Arthur H.M. ter Hofstede ◽

Wil M.P. van der Aalst ◽

...

Keyword(s):

Data Mining ◽

Data Privacy ◽

Process Mining ◽

Personal Data ◽

Data Transformation ◽

Privacy Preserving ◽

Sensitive Information ◽

Process Data ◽

Mining Community ◽

Healthcare Process

Process mining has been successfully applied in the healthcare domain and has helped to uncover various insights for improving healthcare processes. While the benefits of process mining are widely acknowledged, many people rightfully have concerns about irresponsible uses of personal data. Healthcare information systems contain highly sensitive information and healthcare regulations often require protection of data privacy. The need to comply with strict privacy requirements may result in a decreased data utility for analysis. Until recently, data privacy issues did not get much attention in the process mining community; however, several privacy-preserving data transformation techniques have been proposed in the data mining community. Many similarities between data mining and process mining exist, but there are key differences that make privacy-preserving data mining techniques unsuitable to anonymise process data (without adaptations). In this article, we analyse data privacy and utility requirements for healthcare process data and assess the suitability of privacy-preserving data transformation methods to anonymise healthcare data. We demonstrate how some of these anonymisation methods affect various process mining results using three publicly available healthcare event logs. We describe a framework for privacy-preserving process mining that can support healthcare process mining analyses. We also advocate the recording of privacy metadata to capture information about privacy-preserving transformations performed on an event log.

Application of Oblivious Transfer Protocol in Distributed Data Mining with Privacy-preserving

The First International Symposium on Data, Privacy, and E-Commerce (ISDPE 2007) ◽

10.1109/isdpe.2007.39 ◽

2007 ◽

Cited By ~ 3

Author(s):

Weiping Wang ◽

Bing Deng ◽

Zhepeng Li

Keyword(s):

Data Mining ◽

Oblivious Transfer ◽

Privacy Preserving ◽

Distributed Data Mining ◽

Distributed Data ◽

Transfer Protocol

Brief Announcement: Privacy Preserving Mining of Distributed Data Using a Trusted and Partitioned Third Party

Lecture Notes in Computer Science - Cyber Security Cryptography and Machine Learning ◽

10.1007/978-3-319-60080-2_14 ◽

2017 ◽

pp. 193-195 ◽

Cited By ~ 1

Author(s):

Nir Maoz ◽

Ehud Gudes

Keyword(s):

Privacy Preserving ◽

Third Party ◽

Distributed Data