Robust and lossless data privacy preservation: optimal key based data sanitization

Author(s):  
G. K. Shailaja ◽  
C. V. Guru Rao
Author(s):  
Shalin Eliabeth S. ◽  
Sarju S.

Big data privacy preservation is one of the most disturbed issues in current industry. Sometimes the data privacy problems never identified when input data is published on cloud environment. Data privacy preservation in hadoop deals in hiding and publishing input dataset to the distributed environment. In this paper investigate the problem of big data anonymization for privacy preservation from the perspectives of scalability and time factor etc. At present, many cloud applications with big data anonymization faces the same kind of problems. For recovering this kind of problems, here introduced a data anonymization algorithm called Two Phase Top-Down Specialization (TPTDS) algorithm that is implemented in hadoop. For the data anonymization-45,222 records of adults information with 15 attribute values was taken as the input big data. With the help of multidimensional anonymization in map reduce framework, here implemented proposed Two-Phase Top-Down Specialization anonymization algorithm in hadoop and it will increases the efficiency on the big data processing system. By conducting experiment in both one dimensional and multidimensional map reduce framework with Two Phase Top-Down Specialization algorithm on hadoop, the better result shown in multidimensional anonymization on input adult dataset. Data sets is generalized in a top-down manner and the better result was shown in multidimensional map reduce framework by the better IGPL values generated by the algorithm. The anonymization was performed with specialization operation on taxonomy tree. The experiment shows that the solutions improves the IGPL values, anonymity parameter and decreases the execution time of big data privacy preservation by compared to the existing algorithm. This experimental result will leads to great application to the distributed environment.


Author(s):  
Yuanrui Dong ◽  
Peng Zhao ◽  
Hanqiao Yu ◽  
Cong Zhao ◽  
Shusen Yang

The emerging edge-cloud collaborative Deep Learning (DL) paradigm aims at improving the performance of practical DL implementations in terms of cloud bandwidth consumption, response latency, and data privacy preservation. Focusing on bandwidth efficient edge-cloud collaborative training of DNN-based classifiers, we present CDC, a Classification Driven Compression framework that reduces bandwidth consumption while preserving classification accuracy of edge-cloud collaborative DL. Specifically, to reduce bandwidth consumption, for resource-limited edge servers, we develop a lightweight autoencoder with a classification guidance for compression with classification driven feature preservation, which allows edges to only upload the latent code of raw data for accurate global training on the Cloud. Additionally, we design an adjustable quantization scheme adaptively pursuing the tradeoff between bandwidth consumption and classification accuracy under different network conditions, where only fine-tuning is required for rapid compression ratio adjustment. Results of extensive experiments demonstrate that, compared with DNN training with raw data, CDC consumes 14.9× less bandwidth with an accuracy loss no more than 1.06%, and compared with DNN training with data compressed by AE without guidance, CDC introduces at least 100% lower accuracy loss.


2019 ◽  
Vol 29 (1) ◽  
pp. 1441-1452 ◽  
Author(s):  
G.K. Shailaja ◽  
C.V. Guru Rao

Abstract Privacy-preserving data mining (PPDM) is a novel approach that has emerged in the market to take care of privacy issues. The intention of PPDM is to build up data-mining techniques without raising the risk of mishandling of the data exploited to generate those schemes. The conventional works include numerous techniques, most of which employ some form of transformation on the original data to guarantee privacy preservation. However, these schemes are quite multifaceted and memory intensive, thus leading to restricted exploitation of these methods. Hence, this paper intends to develop a novel PPDM technique, which involves two phases, namely, data sanitization and data restoration. Initially, the association rules are extracted from the database before proceeding with the two phases. In both the sanitization and restoration processes, key extraction plays a major role, which is selected optimally using Opposition Intensity-based Cuckoo Search Algorithm, which is the modified format of Cuckoo Search Algorithm. Here, four research issues, such as hiding failure rate, information preservation rate, and false rule generation, and degree of modification are minimized using the adopted sanitization and restoration processes.


Author(s):  
Mahmoud Barhamgi ◽  
Djamal Benslimane ◽  
Chirine Ghedira ◽  
Brahim Medjahed

Recent years have witnessed a growing interest in using Web services as a reliable means for medical data sharing inside and across healthcare organizations. In such service-based data sharing environments, Web service composition emerged as a viable approach to query data scattered across independent locations. Patient data privacy preservation is an important aspect that must be considered when composing medical Web services. In this paper, the authors show how data privacy can be preserved when composing and executing Web services. Privacy constraints are expressed in the form of RDF queries over a mediated ontology. Query rewriting algorithms are defined to process those queries while preserving users’ privacy.


Author(s):  
Nancy Victor ◽  
Daphne Lopez

Data privacy plays a noteworthy part in today's digital world where information is gathered at exceptional rates from different sources. Privacy preserving data publishing refers to the process of publishing personal data without questioning the privacy of individuals in any manner. A variety of approaches have been devised to forfend consumer privacy by applying traditional anonymization mechanisms. But these mechanisms are not well suited for Big Data, as the data which is generated nowadays is not just structured in manner. The data which is generated at very high velocities from various sources includes unstructured and semi-structured information, and thus becomes very difficult to process using traditional mechanisms. This chapter focuses on the various challenges with Big Data, PPDM and PPDP techniques for Big Data and how well it can be scaled for processing both historical and real-time data together using Lambda architecture. A distributed framework for privacy preservation in Big Data by combining Natural language processing techniques is also proposed in this chapter.


Sensors ◽  
2019 ◽  
Vol 19 (9) ◽  
pp. 2109
Author(s):  
Liming Fang ◽  
Minghui Li ◽  
Lu Zhou ◽  
Hanyi Zhang ◽  
Chunpeng Ge

A smart watch is a kind of emerging wearable device in the Internet of Things. The security and privacy problems are the main obstacles that hinder the wide deployment of smart watches. Existing security mechanisms do not achieve a balance between the privacy-preserving and data access control. In this paper, we propose a fine-grained privacy-preserving access control architecture for smart watches (FPAS). In FPAS, we leverage the identity-based authentication scheme to protect the devices from malicious connection and policy-based access control for data privacy preservation. The core policy of FPAS is two-fold: (1) utilizing a homomorphic and re-encrypted scheme to ensure that the ciphertext information can be correctly calculated; (2) dividing the data requester by different attributes to avoid unauthorized access. We present a concrete scheme based on the above prototype and analyze the security of the FPAS. The performance and evaluation demonstrate that the FPAS scheme is efficient, practical, and extensible.


Author(s):  
Geeta S. Navale ◽  
Suresh N. Mali

Nowadays, Data Sanitization is considered as a highly demanded area for solving the issue of privacy preservation in Data mining. Data Sanitization, means that the sensitive rules given by the users with the specific modifications and then releases the modified database so that, the unauthorized users cannot access the sensitive rules. Promisingly, the confidentiality of data is ensured against the data mining methods. The ultimate goal of this paper is to build an effective sanitization algorithm for hiding the sensitive rules given by users/experts. Meanwhile, this paper concentrates on minimizing the four sanitization research challenges namely, rate of hiding failure, rate of Information loss, rate of false rule generation and degree of modification. Moreover, this paper proposes a heuristic optimization algorithm named Self-Adaptive Firefly (SAFF) algorithm to generate the small length key for data sanitization and also to adopt lossless data sanitization and restoration. The generated optimized key is used for both data sanitation as well as the data restoration process. The proposed SAFF-based algorithm is compared and examined against the other existing sanitizing algorithms like Fire Fly (FF), Genetic Algorithm (GA), Particle Swarm Optimization (PSO) and Differential Evolution algorithm (DE) algorithms and the results have shown the excellent performance of proposed algorithm. The proposed algorithm is implemented in JAVA. The data set used are Chess, Retail, T10, and T40.


Sign in / Sign up

Export Citation Format

Share Document