Privacy Preserving Parallel Clustering Based Anonymization for Big Data Using MapReduce Framework

Large and complex data becomes a valuable resource in biomedical discovery, which is highly facilitated to increase the scientific resources for retrieving the helpful information. However, indexing and retrieving the patient information from the disparate source of big data is challenging in biomedical research. Indexing and retrieving the patient information from big data is performed using the MapReduce framework. In this research, the indexing and retrieval of information are performed using the proposed Jaya-Sine Cosine Algorithm (Jaya–SCA)-based MapReduce framework. Initially, the input big data is forwarded to the mapper randomly. The average of each mapper data is calculated, and these data are forwarded to the reducer, where the representative data are stored. For each user query, the input query is matched with the reducer, and thereby, it switches over to the mapper for retrieving the matched best result. The bilevel matching is performed while retrieving the data from the mapper based on the distance between the query. The similarity measure is computed based on the parametric-enabled similarity measure (PESM), cosine similarity and the proposed Jaya–SCA, which is the integration of the Jaya algorithm and the SCA. Moreover, the proposed Jaya–SCA algorithm attained the maximum value of F-measure, recall and precision of 0.5323, 0.4400 and 0.6867, respectively, using the StatLog Heart Disease dataset.

Download Full-text

Privacy Preserving in Big Data: Summary of Classification techniques

SSRN Electronic Journal ◽

10.2139/ssrn.3909074 ◽

2021 ◽

Author(s):

Tan Ruogang

Keyword(s):

Big Data ◽

Privacy Preserving ◽

Classification Techniques

Download Full-text

Privacy Preserving for Big Data Based on Fuzzy Set

Cloud Computing and Security - Lecture Notes in Computer Science ◽

10.1007/978-3-030-00012-7_59 ◽

2018 ◽

pp. 651-659

Author(s):

Jun Wu ◽

Chunzhi Wang

Keyword(s):

Big Data ◽

Fuzzy Set ◽

Privacy Preserving

Download Full-text

A Survey on Privacy Preserving Approaches on Health Care Big Data in Cloud

Proceeding of the International Conference on Computer Networks, Big Data and IoT (ICCBI - 2019) - Lecture Notes on Data Engineering and Communications Technologies ◽

10.1007/978-3-030-43192-1_4 ◽

2020 ◽

pp. 32-38

Author(s):

Lipsa Nayak ◽

V. Jayalakshmi

Keyword(s):

Health Care ◽

Big Data ◽

Privacy Preserving

Download Full-text

Efficient and Privacy-Preserving Multi-User Outsourced K-Means Clustering

Computer and Information Science ◽

10.5539/cis.v14n2p26 ◽

2021 ◽

Vol 14 (2) ◽

pp. 26

Author(s):

Na Li ◽

Lianguan Huang ◽

Yanling Li ◽

Meng Sun

Keyword(s):

Data Mining ◽

Big Data ◽

Clustering Algorithm ◽

Privacy Preserving ◽

Locality Sensitive Hashing ◽

Sensitive Information ◽

The Public ◽

Big Data Mining ◽

Euclidean Distances ◽

Computational Resources

In recent years, with the development of the Internet, the data on the network presents an outbreak trend. Big data mining aims at obtaining useful information through data processing, such as clustering, clarifying and so on. Clustering is an important branch of big data mining and it is popular because of its simplicity. A new trend for clients who lack of storage and computational resources is to outsource the data and clustering task to the public cloud platforms. However, as datasets used for clustering may contain some sensitive information (e.g., identity information, health information), simply outsourcing them to the cloud platforms can't protect the privacy. So clients tend to encrypt their databases before uploading to the cloud for clustering. In this paper, we focus on privacy protection and efficiency promotion with respect to k-means clustering, and we propose a new privacy-preserving multi-user outsourced k-means clustering algorithm which is based on locality sensitive hashing (LSH). In this algorithm, we use a Paillier cryptosystem encrypting databases, and combine LSH to prune off some unnecessary computations during the clustering. That is, we don't need to compute the Euclidean distances between each data record and each clustering center. Finally, the theoretical and experimental results show that our algorithm is more efficient than most existing privacy-preserving k-means clustering.

Download Full-text

Clustering based Privacy Preserving of Big Data using Fuzzification and Anonymization Operation

International Journal of Advanced Computer Science and Applications ◽

10.14569/ijacsa.2019.0101239 ◽

2019 ◽

Vol 10 (12) ◽

Cited By ~ 1

Author(s):

Saira Khan ◽

Khalid Iqbal ◽

Safi Faizullah ◽

Muhammad Fahad ◽

Jawad Ali ◽

...

Keyword(s):

Big Data ◽

Privacy Preserving

Download Full-text

Privacy Preserving Parallel Clustering Based Anonymization for Big Data Using MapReduce Framework

Privacy preserving-aware over big data in clouds using GSA and MapReduce framework

Privacy Preserving Over Big Data Through VSSFA and MapReduce Framework in Cloud Environment

CNB-MRF: Adapting Correlative Naive Bayes Classifier and MapReduce Framework for Big Data Classification

Privacy Preserving Framework for Big Data Management in Smart Buildings

Efficient indexing and retrieval of patient information from the big data using MapReduce framework and optimisation

Privacy Preserving in Big Data: Summary of Classification techniques

Privacy Preserving for Big Data Based on Fuzzy Set

A Survey on Privacy Preserving Approaches on Health Care Big Data in Cloud

Efficient and Privacy-Preserving Multi-User Outsourced K-Means Clustering

Clustering based Privacy Preserving of Big Data using Fuzzification and Anonymization Operation

Export Citation Format