scholarly journals Data Mining-Based Privacy Preservation Technique for Medical Dataset Over Horizontal Partitioned

Author(s):  
Shivlal Mewada

The valuable information is extracted through data mining techniques. Recently, privacy preserving data mining techniques are widely adopted for securing and protecting the information and data. These techniques convert the original dataset into protected dataset through swapping, modification, and deletion functions. This technique works in two steps. In the first step, cloud computing considers a service platform to determine the optimum horizontal partitioning in given data. In this work, K-Means++ algorithm is implemented to determine the horizontal partitioning on the cloud platform without disclosing the cluster centers information. The second steps contain data protection and recover phases. In the second step, noise is incorporated in the database to maintain the privacy and semantic of the data. Moreover, the seed function is used for protecting the original databases. The effectiveness of the proposed technique is evaluated using several benchmark medical datasets. The results are evaluated using encryption time, execution time, accuracy, and f-measure parameters.

2016 ◽  
Vol 12 (12) ◽  
pp. 4601-4610 ◽  
Author(s):  
D. Palanikkumar ◽  
S. Priya ◽  
S. Priya

Privacy preservation is the data mining technique which is to be applied on the databases without violating the privacy of individuals. The sensitive attribute can be selected from the numerical data and it can be modified by any data modification technique. After modification, the modified data can be released to any agency. If they can apply data mining techniques such as clustering, classification etc for data analysis, the modified data does not affect the result. In privacy preservation technique, the sensitive data is converted into modified data using S-shaped fuzzy membership function. K-means clustering is applied for both original and modified data to get the clusters. t-closeness requires that the distribution of sensitive attribute in any equivalence class is close to the distribution of the attribute in the overall table. Earth Mover Distance (EMD) is used to measure the distance between the two distributions should be no more than a threshold t. Hence privacy is preserved and accuracy of the data is maintained.


The improvement of an information processing and Memory capacity, the vast amount of data is collected for various data analyses purposes. Data mining techniques are used to get knowledgeable information. The process of extraction of data by using data mining techniques the data get discovered publically and this leads to breaches of specific privacy data. Privacypreserving data mining is used to provide to protection of sensitive information from unwanted or unsanctioned disclosure. In this paper, we analysis the problem of discovering similarity checks for functional dependencies from a given dataset such that application of algorithm (l, d) inference with generalization can anonymised the micro data without loss in utility. [8] This work has presented Functional dependency based perturbation approach which hides sensitive information from the user, by applying (l, d) inference model on the dependency attributes based on Information Gain. This approach works on both categorical and numerical attributes. The perturbed data set does not affects the original dataset it maintains the same or very comparable patterns as the original data set. Hence the utility of the application is always high, when compared to other data mining techniques. The accuracy of the original and perturbed datasets is compared and analysed using tools, data mining classification algorithm.


2018 ◽  
Vol 5 (3) ◽  
pp. 1-20 ◽  
Author(s):  
Sharmila Subudhi ◽  
Suvasini Panigrahi

This article presents a novel approach for fraud detection in automobile insurance claims by applying various data mining techniques. Initially, the most relevant attributes are chosen from the original dataset by using an evolutionary algorithm based feature selection method. A test set is then extracted from the selected attribute set and the remaining dataset is subjected to the Possibilistic Fuzzy C-Means (PFCM) clustering technique for the undersampling approach. The 10-fold cross validation method is then used on the balanced dataset for training and validating a group of Weighted Extreme Learning Machine (WELM) classifiers generated from various combinations of WELM parameters. Finally, the test set is applied on the best performing model for classification purpose. The efficacy of the proposed system is illustrated by conducting several experiments on a real-world automobile insurance defraud dataset. Besides, a comparative analysis with another approach justifies the superiority of the proposed system.


2021 ◽  
Author(s):  
Rohit Ravindra Nikam ◽  
Rekha Shahapurkar

Data mining is a technique that explores the necessary data is extracted from large data sets. Privacy protection of data mining is about hiding the sensitive information or identity of breach security or without losing data usability. Sensitive data contains confidential information about individuals, businesses, and governments who must not agree upon before sharing or publishing his privacy data. Conserving data mining privacy has become a critical research area. Various evaluation metrics such as performance in terms of time efficiency, data utility, and degree of complexity or resistance to data mining techniques are used to estimate the privacy preservation of data mining techniques. Social media and smart phones produce tons of data every minute. To decision making, the voluminous data produced from the different sources can be processed and analyzed. But data analytics are vulnerable to breaches of privacy. One of the data analytics frameworks is recommendation systems commonly used by e-commerce sites such as Amazon, Flip Kart to recommend items to customers based on their purchasing habits that lead to characterized. This paper presents various techniques of privacy conservation, such as data anonymization, data randomization, generalization, data permutation, etc. such techniques which existing researchers use. We also analyze the gap between various processes and privacy preservation methods and illustrate how to overcome such issues with new innovative methods. Finally, our research describes the outcome summary of the entire literature.


Author(s):  
Darshana H. Patel ◽  
Saurabh Shah ◽  
Avani Vasant

With the advent of various technologies and digitization, popularity of the data mining has been increased for analysis and growth purpose in several fields. However, such pattern discovery by data mining also discloses personal information of an individual or organization. In today’s world, people are very much concerned about their sensitive information which they don’t want to share. Thus, it is very much required to protect the private data. This paper focuses on preserving the sensitive information as well as maintaining the efficiency which gets affected due to privacy preservation. Privacy is preserved by anonymization and efficiency is improved by optimization techniques as now days several advanced optimization techniques are used to solve the various problems of different areas. Furthermore, privacy preserving association classification has been implemented utilizing various datasets considering the accuracy parameter and it has been concluded that as privacy increases, accuracy gets degraded due to data transformation. Hence, optimization techniques are applied to improve the accuracy. In addition, comparison with the existing optimization technique namely particle swarm optimization, Cuckoo search and animal migration optimization has been carried out with the proposed approach specifically genetic algorithm for optimizing association rules.It has been concluded that the proposed approach requires more execution time about 20-80 milliseconds depending on the dataset but at the same time accuracy is improved by 5-6 % as compared to the existing approaches.


Experts from various sectors, utilize data mining techniques to discover most useful information from the huge amount of data, to improve their quality of outcomes. The Presence of irrelevant and redundant features affects the accuracy of mining result. Before applying any mining technique, the data need to be preprocessed. Feature selection, a preprocessing step in data mining provides better mining performance. In this paper, we propose a new two step algorithm for unsupervised feature selection. In the first step Laplacian Score is used to select the important features. And in the second step, Symmetric Uncertainty is used to remove redundant features. The experimental results show that the proposed algorithm outperforms the Laplacian Score algorithm.


2016 ◽  
Vol 120 (1234) ◽  
pp. 1849-1866 ◽  
Author(s):  
A.B. Arockia Christopher ◽  
V. Shunmughavel Vivekanandam ◽  
A.B. Antony Anderson ◽  
S. Markkandeyan ◽  
V. Sivakumar

ABSTRACTData mining is an iterative process in which progress is defined by discovery through either automatic or manual methods. A data cleaning procedure is proposed to improve the quality of classification tasks in the knowledge discovery process by taking into account both redundant and conflicting data. The redundancy check is performed on the original dataset and the resultant dataset is preserved. This resultant dataset is then checked for conflicting data and, if any are found, they are corrected and updated on the original aircraft dataset. This updated dataset is then classified using a variety of classifiers such as Bayes, functions, lazy, MISC, rules and decision trees. The performance of the updated datasets on these classifiers is examine, and the result shows a significant improvement in the classification accuracy after redundancy and conflicts are removed. The conflicts after correction are updated in the original dataset, and when the performance of the classifier is evaluated, great improvement is observed. This paper aims to address how data mining techniques can be used to understand complex system accidents in the aviation domain. Decision trees are considered to be the one of the most powerful and popular approaches in knowledge discovery and data mining. The objective is to develop a classification model for aviation risk investigation and reduction using a decision tree induction method that enhances the ability to form decision trees and thereby proves that the classification accuracy of decision trees is greater. Different feature selectors are used in this study in order to reduce the number of initial attributes.


Sign in / Sign up

Export Citation Format

Share Document