resilient distributed dataset
Recently Published Documents


TOTAL DOCUMENTS

3
(FIVE YEARS 1)

H-INDEX

0
(FIVE YEARS 0)

Electronics ◽  
2020 ◽  
Vol 9 (10) ◽  
pp. 1732
Author(s):  
Sibghat Ullah Bazai ◽  
Julian Jang-Jaccard

Recent studies in data anonymization techniques have primarily focused on MapReduce. However, these existing MapReduce based approaches often suffer from many performance overheads due to their inappropriate use of data allocation, expensive disk I/O access and network transfer, and no support for iterative tasks. We propose “SparkDA” which is a new novel anonymization technique that is designed to take the full advantage of Spark platform to generate privacy-preserving anonymized dataset in the most efficient way possible. Our proposal offers a better partition control, in-memory operation and cache management for iterative operations that are heavily utilised for data anonymization processing. Our proposal is based on Spark’s Resilient Distributed Dataset (RDD) with two critical operations of RDD, such as FlatMapRDD and ReduceByKeyRDD, respectively. The experimental results demonstrate that our proposal outperforms the existing approaches in terms of performance and scalability while maintaining high data privacy and utility levels. This illustrates that our proposal is capable to be used in a wider big data applications that demands privacy.


Sign in / Sign up

Export Citation Format

Share Document