Clustering Algorithm for Privacy Preservation on MapReduce

2019 ◽

Vol 13 ◽

Author(s):

Shobana G ◽

S. Shankar

Keyword(s):

Relative Error ◽

Privacy Preservation ◽

Clustering Algorithm ◽

Clustering Algorithms ◽

Sensitive Information ◽

K Value ◽

Identity Disclosure ◽

Anonymized Data ◽

High Utility ◽

Privacy And Confidentiality

Background: The increasing need for various data publishers to release or share the healthcare datasets has imparted a threat for the privacy and confidentiality of the Electronic Medical Records. However, the main goal is to share useful information thereby maximizing utility as well as ensuring that sensitive information is not disclosed. There always exist utility-privacy tradeoff which needs to be handled properly for the researchers to learn statistical properties of the datasets. Objective: The objective of the research article is to introduce a novel SK-Clustering algorithm that overcomes identity disclosure, attribute disclosure and similarity attacks. The algorithm is evaluated using metrics such as discernability measure and relative error so as to show its performance compared with other clustering algorithms. Methodology: The SK-Clustering algorithm flexibly adjusts the level of protection for high utility. Also the size of the clusters is minimized dynamically based on the requirements of the protection required and we add extra tuples accordingly. This will drastically reduce information loss thereby increasing utilization. Result and Conclusion: For a k-value of 50 the discernabilty measure of SK algorithm is 65000 whereas the Mondrian algorithm exhibits 70000 discernability measure and the Anatomy algorithm has a discernability measure of 150000. Similarly, the relative error of our algorithm is less than 10% for a tuple count of 35000 when compared to other k-anonymity algorithms. The proposed algorithm executes more competently in terms of minimal discernability measure as well as relative error, thereby proving higher data utility compared with traditionally available algorithms.

Download Full-text

An Intellectual Methodology for Secure Health Record Mining and Risk Forecasting Using Clustering and Graph-Based Classification

Journal of Circuits System and Computers ◽

10.1142/s0218126621501358 ◽

2020 ◽

pp. 2150135

Author(s):

D. Shiny Irene ◽

V. Surya ◽

D. Kavitha ◽

R. Shankar ◽

S. John Justin Thangaraj

Keyword(s):

Privacy Preservation ◽

Clustering Algorithm ◽

Distance Measure ◽

Information Gain ◽

Personal Information ◽

Pearson Correlation ◽

Research Work ◽

Health Record ◽

Fisher Score ◽

Health Records

The objective of the research work is to analyze and validate health records and securing the personal information of patients is a challenging issue in health records mining. The risk prediction task was formulated with the label Cause of Death (COD) as a multi-class classification issue, which views health-related death as the “biggest risk.” This unlabeled data particularly describes the health conditions of the participants during the health examinations. It can differ tremendously between healthy and highly ill. Besides, the problems of distributed secure data management over privacy-preserving are considered. The proposed health record mining is in the following stages. In the initial stage, effective features such as fisher score, Pearson correlation, and information gain is calculated from the health records of the patient. Then, the average values are calculated for the extracted features. In the second stage, feature selection is performed from the average features by applying the Euclidean distance measure. The chosen features are clustered in the third stage using distance adaptive fuzzy c-means clustering algorithm (DAFCM). In the fourth stage, an entropy-based graph is constructed for the classification of data and it categorizes the patient’s record. At the last stage, for security, privacy preservation is applied to the personal information of the patient. This performance is matched against the existing methods and it gives better performance than the existing ones.

Download Full-text

MPDP k-medoids: Multiple partition differential privacy preserving k-medoids clustering for data publishing in the Internet of Medical Things

International Journal of Distributed Sensor Networks ◽

10.1177/15501477211042543 ◽

2021 ◽

Vol 17 (10) ◽

pp. 155014772110425

Author(s):

Zekun Zhang ◽

Tongtong Wu ◽

Xiaoting Sun ◽

Jiguo Yu

Keyword(s):

Clustering Analysis ◽

Data Clustering ◽

Privacy Preservation ◽

Clustering Algorithm ◽

Differential Privacy ◽

Data Availability ◽

System Model ◽

Data Publishing ◽

User Data ◽

Internet Of Medical Things

The tremendous growth of Internet of Medical Things has led to a surge in medical user data, and medical data publishing can provide users with numerous services. However, neglectfully publishing the data may lead to severe leakage of user’s privacy. In this article, we investigate the problem of data publishing in Internet of Medical Things with privacy preservation. We present a novel system model for Internet of Medical Things user data publishing which adopts the proposed multiple partition differential privacy k-medoids clustering algorithm for data clustering analysis to ensure the security of user data. Particularly, we propose a multiple partition differential privacy k-medoids clustering algorithm based on differential privacy in data publishing. Based on the traditional k-medoids clustering, multiple partition differential privacy k-medoids clustering algorithm optimizes the randomness of selecting initial center points and adds Laplace noise to the clustering process to improve data availability while protecting user’s privacy information. Comprehensive analysis and simulations demonstrate that our method can not only meet the requirements of differential privacy but also retain the better availability of data clustering.

Download Full-text

Publishing Anonymized Set-Valued Data via Disassociation towards Analysis

Future Internet ◽

10.3390/fi12040071 ◽

2020 ◽

Vol 12 (4) ◽

pp. 71

Author(s):

Nancy Awad ◽

Jean-Francois Couchot ◽

Bechara Al Bouna ◽

Laurent Philippe

Keyword(s):

Data Analysis ◽

Privacy Preservation ◽

Clustering Algorithm ◽

Edit Distance ◽

Knowledge Extraction ◽

Data Publishing ◽

Tree Edit Distance ◽

Accurate Analysis ◽

Future Analysis ◽

Mathematical Properties

Data publishing is a challenging task for privacy preservation constraints. To ensure privacy, many anonymization techniques have been proposed. They differ in terms of the mathematical properties they verify and in terms of the functional objectives expected. Disassociation is one of the techniques that aim at anonymizing of set-valued datasets (e.g., discrete locations, search and shopping items) while guaranteeing the confidentiality property known as k m -anonymity. Disassociation separates the items of an itemset in vertical chunks to create ambiguity in the original associations. In a previous work, we defined a new ant-based clustering algorithm for the disassociation technique to preserve some items associated together, called utility rules, throughout the anonymization process, for accurate analysis. In this paper, we examine the disassociated dataset in terms of knowledge extraction. To make data analysis easy on top of the anonymized dataset, we define neighbor datasets or in other terms datasets that are the result of a probabilistic re-association process. To assess the neighborhood notion set-valued datasets are formalized into trees and a tree edit distance (TED) is directly applied between these neighbors. Finally, we prove the faithfulness of the neighbors to knowledge extraction for future analysis, in the experiments.

Download Full-text

Outlier-eliminated k-means clustering algorithm based on differential privacy preservation

Applied Intelligence ◽

10.1007/s10489-016-0813-z ◽

2016 ◽

Vol 45 (4) ◽

pp. 1179-1191 ◽

Cited By ~ 15

Author(s):

Qingying Yu ◽

Yonglong Luo ◽

Chuanming Chen ◽

Xintao Ding

Keyword(s):

Privacy Preservation ◽

Clustering Algorithm ◽

Differential Privacy

Download Full-text

Privacy preservation problems in online social networks

PsycEXTRA Dataset ◽

10.1037/e502102013-033 ◽

2012 ◽

Author(s):

Marius Kalinauskas

Keyword(s):

Social Networks ◽

Online Social Networks ◽

Privacy Preservation

Download Full-text

Distributed Entropy Energy-Efficient Clustering algorithm for cluster head selection (DEEEC)

Journal of Intelligent & Fuzzy Systems ◽

10.3233/jifs-189135 ◽

2020 ◽

Vol 39 (6) ◽

pp. 8139-8147

Author(s):

Ranganathan Arun ◽

Rangaswamy Balamurugan

Keyword(s):

Energy Efficient ◽

Clustering Algorithm ◽

Cluster Head ◽

Residual Energy ◽

Energy Utilization ◽

Sensor Nodes ◽

Second Stage ◽

Energy Efficient Clustering ◽

Two Stages ◽

Ch Selection

In Wireless Sensor Networks (WSN) the energy of Sensor nodes is not certainly sufficient. In order to optimize the endurance of WSN, it is essential to minimize the utilization of energy. Head of group or Cluster Head (CH) is an eminent method to develop the endurance of WSN that aggregates the WSN with higher energy. CH for intra-cluster and inter-cluster communication becomes dependent. For complete, in WSN, the Energy level of CH extends its life of cluster. While evolving cluster algorithms, the complicated job is to identify the energy utilization amount of heterogeneous WSNs. Based on Chaotic Firefly Algorithm CH (CFACH) selection, the formulated work is named “Novel Distributed Entropy Energy-Efficient Clustering Algorithm”, in short, DEEEC for HWSNs. The formulated DEEEC Algorithm, which is a CH, has two main stages. In the first stage, the identification of temporary CHs along with its entropy value is found using the correlative measure of residual and original energy. Along with this, in the clustering algorithm, the rotating epoch and its entropy value must be predicted automatically by its sensor nodes. In the second stage, if any member in the cluster having larger residual energy, shall modify the temporary CHs in the direction of the deciding set. The target of the nodes with large energy has the probability to be CHs which is determined by the above two stages meant for CH selection. The MATLAB is required to simulate the DEEEC Algorithm. The simulated results of the formulated DEEEC Algorithm produce good results with respect to the energy and increased lifetime when it is correlated with the current traditional clustering protocols being used in the Heterogeneous WSNs.

Download Full-text

Handling WSD using Hierarchical Clustering Algorithm with sentences

International Journal of Scientific Research in Science Engineering and Technology ◽

10.32628/ijsrset1841120 ◽

2018 ◽

pp. 83-88

Author(s):

Mohana Priya K ◽

Pooja Ragavi S ◽

Krishna Priya G

Keyword(s):

Hierarchical Clustering ◽

Similarity Measure ◽

Clustering Algorithm ◽

Clustering Algorithms ◽

Cosine Similarity Measure ◽

Hierarchical Clustering Algorithm ◽

Multiple Levels ◽

Pos Tagger ◽

Sentence Clustering ◽

The Right

Clustering is the process of grouping objects into subsets that have meaning in the context of a particular problem. It does not rely on predefined classes. It is referred to as an unsupervised learning method because no information is provided about the "right answer" for any of the objects. Many clustering algorithms have been proposed and are used based on different applications. Sentence clustering is one of best clustering technique. Hierarchical Clustering Algorithm is applied for multiple levels for accuracy. For tagging purpose POS tagger, porter stemmer is used. WordNet dictionary is utilized for determining the similarity by invoking the Jiang Conrath and Cosine similarity measure. Grouping is performed with respect to the highest similarity measure value with a mean threshold. This paper incorporates many parameters for finding similarity between words. In order to identify the disambiguated words, the sense identification is performed for the adjectives and comparison is performed. semcor and machine learning datasets are employed. On comparing with previous results for WSD, our work has improvised a lot which gives a percentage of 91.2%

Download Full-text

K-MEANS CLUSTERING ALGORITHM BASED CLASSIFICATION OF SOIL FERTILITY IN NORTH WEST NIGERIA

FUDMA Journal of Sciences ◽

10.33003/fjs-2020-0402-363 ◽

2020 ◽

Vol 4 (2) ◽

pp. 780-787

Author(s):

Ibrahim Hassan Hayatu ◽

Abdullahi Mohammed ◽

Barroon Ahmad Isma’eel ◽

Sahabi Yusuf Ali

Keyword(s):

Soil Fertility ◽

Crop Yield ◽

Clustering Algorithm ◽

Soil Samples ◽

North West ◽

R Programming ◽

Available Information ◽

Northwest Region ◽

The Relationship

Soil fertility determines a plant's development process that guarantees food sufficiency and the security of lives and properties through bumper harvests. The fertility of soil varies according to regions, thereby determining the type of crops to be planted. However, there is no repository or any source of information about the fertility of the soil in any region in Nigeria especially the Northwest of the country. The only available information is soil samples with their attributes which gives little or no information to the average farmer. This has affected crop yield in all the regions, more particularly the Northwest region, thus resulting in lower food production. Therefore, this study is aimed at classifying soil data based on their fertility in the Northwest region of Nigeria using R programming. Data were obtained from the department of soil science from Ahmadu Bello University, Zaria. The data contain 400 soil samples containing 13 attributes. The relationship between soil attributes was observed based on the data. K-means clustering algorithm was employed in analyzing soil fertility clusters. Four clusters were identified with cluster 1 having the highest fertility, followed by 2 and the fertility decreases with an increasing number of clusters. The identification of the most fertile clusters will guide farmers on where best to concentrate on when planting their crops in order to improve productivity and crop yield.

Download Full-text

An Efficient Clustering Algorithm Based on Expectation Maximization Algorithm in Wireless Sensor Network

Oct. 17-19, 2017 Dubai (UAE) ◽

10.15242/dirpub.dir1017011 ◽

2018 ◽

Keyword(s):

Wireless Sensor Network ◽

Sensor Network ◽

Expectation Maximization ◽

Clustering Algorithm ◽

Expectation Maximization Algorithm ◽

Wireless Sensor

Download Full-text

Clustering Algorithm for Privacy Preservation on MapReduce

Utility-based SK-Clustering algorithm for Privacy Preservation of Anonymized Data in Healthcare

An Intellectual Methodology for Secure Health Record Mining and Risk Forecasting Using Clustering and Graph-Based Classification

MPDP k-medoids: Multiple partition differential privacy preserving k-medoids clustering for data publishing in the Internet of Medical Things

Publishing Anonymized Set-Valued Data via Disassociation towards Analysis

Outlier-eliminated k-means clustering algorithm based on differential privacy preservation

Privacy preservation problems in online social networks

Distributed Entropy Energy-Efficient Clustering algorithm for cluster head selection (DEEEC)

Handling WSD using Hierarchical Clustering Algorithm with sentences

K-MEANS CLUSTERING ALGORITHM BASED CLASSIFICATION OF SOIL FERTILITY IN NORTH WEST NIGERIA

An Efficient Clustering Algorithm Based on Expectation Maximization Algorithm in Wireless Sensor Network

Export Citation Format