scholarly journals Complementing Privacy and Utility Trade-Off with Self-Organising Maps

Cryptography ◽  
2021 ◽  
Vol 5 (3) ◽  
pp. 20
Author(s):  
Kabiru Mohammed ◽  
Aladdin Ayesh ◽  
Eerke Boiten

In recent years, data-enabled technologies have intensified the rate and scale at which organisations collect and analyse data. Data mining techniques are applied to realise the full potential of large-scale data analysis. These techniques are highly efficient in sifting through big data to extract hidden knowledge and assist evidence-based decisions, offering significant benefits to their adopters. However, this capability is constrained by important legal, ethical and reputational concerns. These concerns arise because they can be exploited to allow inferences to be made on sensitive data, thus posing severe threats to individuals’ privacy. Studies have shown Privacy-Preserving Data Mining (PPDM) can adequately address this privacy risk and permit knowledge extraction in mining processes. Several published works in this area have utilised clustering techniques to enforce anonymisation models on private data, which work by grouping the data into clusters using a quality measure and generalising the data in each group separately to achieve an anonymisation threshold. However, existing approaches do not work well with high-dimensional data, since it is difficult to develop good groupings without incurring excessive information loss. Our work aims to complement this balancing act by optimising utility in PPDM processes. To illustrate this, we propose a hybrid approach, that combines self-organising maps with conventional privacy-based clustering algorithms. We demonstrate through experimental evaluation, that results from our approach produce more utility for data mining tasks and outperforms conventional privacy-based clustering algorithms. This approach can significantly enable large-scale analysis of data in a privacy-preserving and trustworthy manner.

Author(s):  
D. Radhika ◽  
D. Aruna Kumari

Leakage and misuse of sensitive data is a challenging problem to enterprises. It has become more serious problem with the advent of cloud and big data. The rationale behind this is the increase in outsourcing of data to public cloud and publishing data for wider visibility. Therefore Privacy Preserving Data Publishing (PPDP), Privacy Preserving Data Mining (PPDM) and Privacy Preserving Distributed Data Mining (PPDM) are crucial in the contemporary era. PPDP and PPDM can protect privacy at data and process levels respectively. Therefore, with big data privacy to data became indispensable due to the fact that data is stored and processed in semi-trusted environment. In this paper we proposed a comprehensive methodology for effective sanitization of data based on misusability measure for preserving privacy to get rid of data leakage and misuse. We followed a hybrid approach that caters to the needs of privacy preserving MapReduce programming. We proposed an algorithm known as Misusability Measure-Based Privacy serving Algorithm (MMPP) which considers level of misusability prior to choosing and application of appropriate sanitization on big data. Our empirical study with Amazon EC2 and EMR revealed that the proposed methodology is useful in realizing privacy preserving Map Reduce programming.


2013 ◽  
Vol 12 (5) ◽  
pp. 3443-3451
Author(s):  
Rajesh Pasupuleti ◽  
Narsimha Gugulothu

Clustering analysis initiatives  a new direction in data mining that has major impact in various domains including machine learning, pattern recognition, image processing, information retrieval and bioinformatics. Current clustering techniques address some of the  requirements not adequately and failed in standardizing clustering algorithms to support for all real applications. Many clustering methods mostly depend on user specified parametric methods and initial seeds of clusters are randomly selected by  user.  In this paper, we proposed new clustering method based on linear approximation of function by getting over all idea of behavior knowledge of clustering function, then pick the initial seeds of clusters as the points on linear approximation line and perform clustering operations, unlike grouping data objects into clusters by using distance measures, similarity measures and statistical distributions in traditional clustering methods. We have shown experimental results as clusters based on linear approximation yields good  results in practice with an example of  business data are provided.  It also  explains privacy preserving clusters of sensitive data objects.


2011 ◽  
Vol 403-408 ◽  
pp. 920-928 ◽  
Author(s):  
Nekuri Naveen ◽  
V. Ravi ◽  
C. Raghavendra Rao

In the last two decades in areas like banking, finance and medical research privacy policies restrict the data owners to share the data for data mining purpose. This issue throws up a new area of research namely privacy preserving data mining. In this paper, we proposed a privacy preservation method by employing Particle Swarm Optimization (PSO) trained Auto Associative Neural Network (PSOAANN). The modified (privacy preserved) input values are fed to a decision tree (DT) and a rule induction algorithm viz., Ripper for rule extraction purpose. The performance of the hybrid is tested on four benchmark and bankruptcy datasets using 10-fold cross validation. The results are compared with those obtained using the original datasets where privacy is not preserved. The proposed hybrid approach achieved good results in all datasets.


2009 ◽  
Vol 68 (11) ◽  
pp. 1224-1236 ◽  
Author(s):  
Emmanouil Magkos ◽  
Manolis Maragoudakis ◽  
Vassilis Chrissikopoulos ◽  
Stefanos Gritzalis

2008 ◽  
pp. 2402-2420
Author(s):  
Lixin Fu ◽  
Hamid Nemati ◽  
Fereidoon Sadri

Privacy-preserving data mining (PPDM) refers to data mining techniques developed to protect sensitive data while allowing useful information to be discovered from the data. In this article, we review PPDM and present a broad survey of related issues, techniques, measures, applications, and regulation guidelines. We observe that the rapid pace of change in information technologies available to sustain PPDM has created a gap between theory and practice. We posit that without a clear understanding of the practice, this gap will be widening which, ultimately, will be detrimental to the field. We conclude by proposing a comprehensive research agenda intended to bridge the gap relevant to practice and as a reference basis for the future related legislation activities.


Author(s):  
Sumana M. ◽  
Hareesha K. S. ◽  
Sampath Kumar

Essential predictions are to be made by the parties distributed at multiple locations. However, in the process of building a model, perceptive data is not to be revealed. Maintaining the privacy of such data is a foremost concern. Earlier approaches developed for classification and prediction are proven not to be secure enough and the performance is affected. This chapter focuses on the secure construction of commonly used classifiers. The computations performed during model building are proved to be semantically secure. The homomorphism and probabilistic property of Paillier is used to perform secure product, mean, and variance calculations. The secure computations are performed without any intermediate data or the sensitive data at multiple sites being revealed. It is observed that the accuracy of the classifiers modeled is almost equivalent to the non-privacy preserving classifiers. Secure protocols require reduced computation time and communication cost. It is also proved that proposed privacy preserving classifiers perform significantly better than the base classifiers.


Author(s):  
Gautam Manohar ◽  
Conrad Tucker

This paper proposes a privacy preserving data mining driven methodology for predicting emerging human threats in a public space by capturing large scale, real time body movement data (spatial data represented in X, Y, Z coordinate space) using Red-Green-Blue (RGB) image, infrared depth and skeletal image sensing technology. Unlike traditional passive surveillance systems (e.g., CCTV video surveillance systems), multimodal surveillance technologies have the ability to capture multiple data streams in a real time dynamic manner. However, mathematical models based on machine learning principles are needed to convert the large-scale data into knowledge to serve as a decision support system for autonomously predicting emerging threats, rather than just recording and observing them as they occur. To this end, the authors of this work present a privacy preserving data mining driven methodology that captures emergent behavior of individuals in a public space and classifies them as a threat or not a threat, based on the underlying body movements through space and time. An audience in a public environment is presented as the case study for this paper with the aim of classifying individuals in the audience as threats (or not), based on their temporal body behavior profiles.


Sign in / Sign up

Export Citation Format

Share Document