Data Privacy in Data Engineering, the Privacy Preserving Models and Techniques in Data Mining and Data Publishing: Contemporary Affirmation of the Recent Literature

Leakage and misuse of sensitive data is a challenging problem to enterprises. It has become more serious problem with the advent of cloud and big data. The rationale behind this is the increase in outsourcing of data to public cloud and publishing data for wider visibility. Therefore Privacy Preserving Data Publishing (PPDP), Privacy Preserving Data Mining (PPDM) and Privacy Preserving Distributed Data Mining (PPDM) are crucial in the contemporary era. PPDP and PPDM can protect privacy at data and process levels respectively. Therefore, with big data privacy to data became indispensable due to the fact that data is stored and processed in semi-trusted environment. In this paper we proposed a comprehensive methodology for effective sanitization of data based on misusability measure for preserving privacy to get rid of data leakage and misuse. We followed a hybrid approach that caters to the needs of privacy preserving MapReduce programming. We proposed an algorithm known as Misusability Measure-Based Privacy serving Algorithm (MMPP) which considers level of misusability prior to choosing and application of appropriate sanitization on big data. Our empirical study with Amazon EC2 and EMR revealed that the proposed methodology is useful in realizing privacy preserving Map Reduce programming.

Download Full-text

Secured Multi-Party Data Release on Cloud for Big Data Privacy-Preserving Using Fusion Learning

Turkish Journal of Computer and Mathematics Education (TURCOMAT) ◽

10.17762/turcomat.v12i3.1893 ◽

2021 ◽

Vol 12 (3) ◽

pp. 4716-4725

Author(s):

Divya Dangi Et.al

Keyword(s):

Data Mining ◽

Data Privacy ◽

Algorithm Design ◽

Privacy Preserving ◽

Current Data ◽

Data Publishing ◽

Data Sets ◽

Complex Data ◽

Security Model ◽

Serial Data

Previous computer protection analysis focuses on current data sets that do not have an update and need one-time releases. Serial data publishing on a complex data collection has only a little bit of literature, although it is not completely considered either. They cannot be used against various backgrounds or the usefulness of the publication of serial data is weak. A new generalization hypothesis is developed on the basis of a theoretical analysis, which effectively decreases the risk of re-publication of certain sensitive attributes. The results suggest that our higher anonymity and lower hiding rates were present in our algorithm. Design and Implementation of new proposed privacy preserving technique: In this phase proposed technique is implemented for demonstrating the entire scenario of data aggregation and their privacy preserving data mining. Comparative Production between the proposed technology and the traditional technology for the application of C.45: In this stage, the performance is evaluated and a comparative comparison with the standard algorithm for the proposed data mining security model is presented

Download Full-text

Privacy Preserving Data Publishing for Multiple Sensitive Attributes Based on Security Level

Information ◽

10.3390/info11030166 ◽

2020 ◽

Vol 11 (3) ◽

pp. 166

Author(s):

Yuelei Xiao ◽

Haiqi Li

Keyword(s):

Data Privacy ◽

Privacy Preserving ◽

Information Loss ◽

Experimental Results ◽

Data Publishing ◽

Security Level ◽

Sensitive Attribute ◽

Data Volume ◽

Security Levels ◽

Privacy Preserving Data Publishing

Privacy preserving data publishing has received considerable attention for publishing useful information while preserving data privacy. The existing privacy preserving data publishing methods for multiple sensitive attributes do not consider the situation that different values of a sensitive attribute may have different sensitivity requirements. To solve this problem, we defined three security levels for different sensitive attribute values that have different sensitivity requirements, and given an L s l -diversity model for multiple sensitive attributes. Following this, we proposed three specific greed algorithms based on the maximal-bucket first (MBF), maximal single-dimension-capacity first (MSDCF) and maximal multi-dimension-capacity first (MMDCF) algorithms and the maximal security-level first (MSLF) greed policy, named as MBF based on MSLF (MBF-MSLF), MSDCF based on MSLF (MSDCF-MSLF) and MMDCF based on MSLF (MMDCF-MSLF), to implement the L s l -diversity model for multiple sensitive attributes. The experimental results show that the three algorithms can greatly reduce the information loss of the published microdata, but their runtime is only a small increase, and their information loss tends to be stable with the increasing of data volume. And they can solve the problem that the information loss of MBF, MSDCF and MMDCF increases greatly with the increasing of sensitive attribute number.

Download Full-text

Privacy-Preserving Process Mining in Healthcare

International Journal of Environmental Research and Public Health ◽

10.3390/ijerph17051612 ◽

2020 ◽

Vol 17 (5) ◽

pp. 1612 ◽

Cited By ~ 4

Author(s):

Anastasiia Pika ◽

Moe T. Wynn ◽

Stephanus Budiono ◽

Arthur H.M. ter Hofstede ◽

Wil M.P. van der Aalst ◽

...

Keyword(s):

Data Mining ◽

Data Privacy ◽

Process Mining ◽

Personal Data ◽

Data Transformation ◽

Privacy Preserving ◽

Sensitive Information ◽

Process Data ◽

Mining Community ◽

Healthcare Process

Process mining has been successfully applied in the healthcare domain and has helped to uncover various insights for improving healthcare processes. While the benefits of process mining are widely acknowledged, many people rightfully have concerns about irresponsible uses of personal data. Healthcare information systems contain highly sensitive information and healthcare regulations often require protection of data privacy. The need to comply with strict privacy requirements may result in a decreased data utility for analysis. Until recently, data privacy issues did not get much attention in the process mining community; however, several privacy-preserving data transformation techniques have been proposed in the data mining community. Many similarities between data mining and process mining exist, but there are key differences that make privacy-preserving data mining techniques unsuitable to anonymise process data (without adaptations). In this article, we analyse data privacy and utility requirements for healthcare process data and assess the suitability of privacy-preserving data transformation methods to anonymise healthcare data. We demonstrate how some of these anonymisation methods affect various process mining results using three publicly available healthcare event logs. We describe a framework for privacy-preserving process mining that can support healthcare process mining analyses. We also advocate the recording of privacy metadata to capture information about privacy-preserving transformations performed on an event log.

Download Full-text

Privacy Preserving Big Data Publishing

Research Anthology on Privatizing and Securing Data ◽

10.4018/978-1-7998-8954-0.ch060 ◽

2021 ◽

pp. 1281-1298

Author(s):

Nancy Victor ◽

Daphne Lopez

Keyword(s):

Big Data ◽

Language Processing ◽

Data Privacy ◽

Privacy Preservation ◽

Personal Data ◽

Privacy Preserving ◽

Data Publishing ◽

Time Data ◽

Digital World ◽

Distributed Framework

Data privacy plays a noteworthy part in today's digital world where information is gathered at exceptional rates from different sources. Privacy preserving data publishing refers to the process of publishing personal data without questioning the privacy of individuals in any manner. A variety of approaches have been devised to forfend consumer privacy by applying traditional anonymization mechanisms. But these mechanisms are not well suited for Big Data, as the data which is generated nowadays is not just structured in manner. The data which is generated at very high velocities from various sources includes unstructured and semi-structured information, and thus becomes very difficult to process using traditional mechanisms. This chapter focuses on the various challenges with Big Data, PPDM and PPDP techniques for Big Data and how well it can be scaled for processing both historical and real-time data together using Lambda architecture. A distributed framework for privacy preservation in Big Data by combining Natural language processing techniques is also proposed in this chapter.

Download Full-text

Pertinent Exploration of Privacy Preserving Perturbation Methods

International Journal of Recent Technology and Engineering - 2 ◽

10.35940/ijrte.f8007.038620 ◽

2020 ◽

Vol 8 (6) ◽

pp. 1945-1949

Keyword(s):

Mental Health ◽

Data Mining ◽

Data Privacy ◽

Perturbation Methods ◽

Privacy Preserving ◽

Privacy Leakage ◽

Huge Data ◽

Medical Sector ◽

Share Data ◽

Banking Business

Digital era generates a huge amount of data in many sectors like education, medical, banking, business, marketing, etc. which can be used for research motive, analysis, prediction of trends, statistics, etc. Data mining techniques are useful in finding patterns, trends, and knowledge from such huge data. The data holders are not ready to share data because there are chances of privacy leakage. Sharing of such data immensely helps researchers to obtain knowledge from it, especially medical data. Privacy preserving data mining is one way where researchers will get mine data for gaining knowledge without breaching the privacy. In the medical sector there is a branch called the mental health section, where high confidentiality of data is maintained and is needed. Owners are not ready to share data for research motives. Mental health is nowadays a topic that is most frequently discussed when it comes to research. PPDM allows sharing data with the researcher, where the privacy of data is maintained by using perturbation techniques giving relief to doctors (owner of data). The current paper experiments and analyses different perturbation methods to preserve privacy in data mining

Download Full-text

Data privacy in construction industry by privacy-preserving data mining (PPDM) approach

Asian Journal of Civil Engineering ◽

10.1007/s42107-020-00225-3 ◽

2020 ◽

Vol 21 (3) ◽

pp. 505-515

Author(s):

Tirth Patel ◽

Vejal Patel

Keyword(s):

Data Mining ◽

Construction Industry ◽

Data Privacy ◽

Privacy Preserving ◽

Privacy Preserving Data Mining

Download Full-text

A Survey on Privacy Preserving Dynamic Data Publishing

Research Anthology on Privatizing and Securing Data ◽

10.4018/978-1-7998-8954-0.ch079 ◽

2021 ◽

pp. 1635-1657

Author(s):

Salheddine Kabou ◽

Sidi mohamed Benslimane ◽

Mhammed Mosteghanemi

Keyword(s):

Data Privacy ◽

Privacy Preservation ◽

Personal Information ◽

Privacy Preserving ◽

Data Publishing ◽

Future Research ◽

Dynamic Data ◽

Practical Applications ◽

New Process ◽

Future Research Directions

Many organizations, especially small and medium business (SMB) enterprises require the collection and sharing of data containing personal information. The privacy of this data must be preserved before outsourcing to the commercial public. Privacy preserving data publishing PPDP refers to the process of publishing useful information while preserving data privacy. A variety of approaches have been proposed to ensure privacy by applying traditional anonymization models which focused only on the single publication of datasets. In practical applications, data publishing is more complicated where the organizations publish multiple times for different recipients or after modifications to provide up-to-date data. Privacy preserving dynamic data publication PPDDP is a new process in privacy preservation which addresses the anonymization of the data for different purposes. In this survey, the author will systematically evaluate and summarize different studies to PPDDP, clarify the differences and requirements between the scenarios that can exist, and propose future research directions.

Download Full-text

Privacy Preserving Classification of Biomedical Data With Secure Removing of Duplicate Records

Research Anthology on Privatizing and Securing Data ◽

10.4018/978-1-7998-8954-0.ch026 ◽

2021 ◽

pp. 569-588

Author(s):

Boudheb Tarik ◽

Elberrichi Zakaria

Keyword(s):

Data Mining ◽

Data Privacy ◽

Privacy Preserving ◽

Third Party ◽

Distributed Data ◽

Biomedical Data ◽

Collaborative Models ◽

Highly Sensitive ◽

Complete Access

Classifying data is to automatically assign predefined classes to data. It is one of the main applications of data mining. Having complete access to all data is critical for building accurate models. Data can be highly sensitive, such as biomedical data, which cannot be disclosed or shared with third party, because it can harm individuals and organizations. The challenge is how to preserve privacy and usefulness of data. Privacy preserving classification addresses this problem. Collaborative models are constructed over networks without violating the data owners' privacy. In this article, the authors address two problems: privacy records deduplication of the same records and privacy-preserving classification. They propose a randomized hash technic for deduplication and an enhanced privacy preserving classification of biomedical data over horizontally distributed data based on two homomorphic encryptions. No private, intermediate or final results are disclosed. Experimentations show that their solution is efficient and secure without loss of accuracy.

Download Full-text

Study on distributed privacy preserving data mining

World Journal of Engineering ◽

10.1260/1708-5284.11.2.163 ◽

2014 ◽

Vol 11 (2) ◽

pp. 163-170

Author(s):

Binli Wang ◽

Yanguang Shen

Keyword(s):

Data Mining ◽

Data Privacy ◽

Rapid Development ◽

Privacy Preserving ◽

Future Research ◽

Distributed Data ◽

Distributed Environment ◽

Privacy Preserving Data Mining ◽

Advantages And Disadvantages ◽

Future Research Directions

Recently, with the rapid development of network, communications and computer technology, privacy preserving data mining (PPDM) has become an increasingly important research in the field of data mining. In distributed environment, how to protect data privacy while doing data mining jobs from a large number of distributed data is more far-researching. This paper describes current research of PPDM at home and abroad. Then it puts emphasis on classifying the typical uses and algorithms of PPDM in distributed environment, and summarizing their advantages and disadvantages. Furthermore, it points out the future research directions in the field.

Download Full-text