resilient distributed dataset
Recently Published Documents

TOTAL DOCUMENTS

(FIVE YEARS 0)

H-INDEX

(FIVE YEARS 0)

Latest Documents Most Cited Documents Contributed Authors Related Sources Related Keywords

In-Memory Data Anonymization Using Scalable and High Performance RDD Design

Electronics ◽

10.3390/electronics9101732 ◽

2020 ◽

Vol 9 (10) ◽

pp. 1732

Author(s):

Sibghat Ullah Bazai ◽

Julian Jang-Jaccard

Keyword(s):

Data Privacy ◽

High Performance ◽

Data Allocation ◽

Data Anonymization ◽

Inappropriate Use ◽

High Data ◽

Use Of Data ◽

New Novel ◽

Resilient Distributed Dataset ◽

Big Data Applications

Recent studies in data anonymization techniques have primarily focused on MapReduce. However, these existing MapReduce based approaches often suffer from many performance overheads due to their inappropriate use of data allocation, expensive disk I/O access and network transfer, and no support for iterative tasks. We propose “SparkDA” which is a new novel anonymization technique that is designed to take the full advantage of Spark platform to generate privacy-preserving anonymized dataset in the most efficient way possible. Our proposal offers a better partition control, in-memory operation and cache management for iterative operations that are heavily utilised for data anonymization processing. Our proposal is based on Spark’s Resilient Distributed Dataset (RDD) with two critical operations of RDD, such as FlatMapRDD and ReduceByKeyRDD, respectively. The experimental results demonstrate that our proposal outperforms the existing approaches in terms of performance and scalability while maintaining high data privacy and utility levels. This illustrates that our proposal is capable to be used in a wider big data applications that demands privacy.

Spark Architecture and the Resilient Distributed Dataset

PySpark Recipes ◽

10.1007/978-1-4842-3141-8_4 ◽

2017 ◽

pp. 85-114

Author(s):

Raju Kumar Mishra

Keyword(s):

Resilient Distributed Dataset

Fast Computing of Microarray Data Using Resilient Distributed Dataset of Apache Spark

Recent Advances in Information and Communication Technology 2016 - Advances in Intelligent Systems and Computing ◽

10.1007/978-3-319-40415-8_17 ◽

2016 ◽

pp. 171-182

Author(s):

Ransingh Biswajit Ray ◽

Mukesh Kumar ◽

Santanu Kumar Rath

Keyword(s):

Microarray Data ◽

Apache Spark ◽

Resilient Distributed Dataset

resilient distributed datasetRecently Published Documents

TOTAL DOCUMENTS

H-INDEX

In-Memory Data Anonymization Using Scalable and High Performance RDD Design

Spark Architecture and the Resilient Distributed Dataset

Fast Computing of Microarray Data Using Resilient Distributed Dataset of Apache Spark

resilient distributed dataset
Recently Published Documents