Privacy-Preserving Classification of Customer Data without Loss of Accuracy

Author(s):  
Zhiqiang Yang ◽  
Sheng Zhong ◽  
Rebecca N. Wright
Author(s):  
Dilip Kumar Sharma ◽  
Sarika Lohana ◽  
Saurabh Arora ◽  
Ashutosh Dixit ◽  
Mohit Tiwari ◽  
...  

2013 ◽  
Vol 4 (3) ◽  
pp. 813-820
Author(s):  
Kiran P ◽  
Kavya N. P.

The core objective of privacy preserving data mining is to preserve the confidentiality of individual even after mining. The basic advantage of personalized privacy preservation is that the information loss is very less as compared with other privacy preservation algorithms. These algorithms how ever have not been designed for specific mining algorithms. SW-SDF personalized privacy preservation uses two flags SW and SDF. SW is used for assigning a weight for the sensitive attribute and SDF for sensitive disclosure which is accepted from individual. In this paper we have designed an algorithm which uses SW-SDF personal privacy preservation for data classification. This method ensures privacy and classification of data.


2015 ◽  
Vol 4 (4) ◽  
pp. 163
Author(s):  
NUR FAIZA ◽  
I WAYAN SUMARJAYA ◽  
I GUSTI AYU MADE SRINADI

This aim of this research is to find out the classification results and to compare the magnitude of misclassification of QUEST and CHAID methods on the classification of customer of Adira Kredit Elektronik branch Denpasar. QUEST (Quick, Unbiased, Efficient Statistical Trees) and CHAID (Chi-squared Automatic Interaction Detection) are nonparametric methods that produce tree diagram which is easy to interpret. The QUEST and CHAID classification methods conclude that: 1) QUEST method produces three groups which predict customers into the current category, whereas CHAID method produces four groups which also  predict customer into the current category; 2) both methods generate the biggest classification accuracy for customers that current category which share similar characteristics; 3) both methods also have the same degree of accuracy in classifying customer data Adira Kredit Elektronik branch Denpasar.


Author(s):  
Boudheb Tarik ◽  
Elberrichi Zakaria

Classifying data is to automatically assign predefined classes to data. It is one of the main applications of data mining. Having complete access to all data is critical for building accurate models. Data can be highly sensitive, such as biomedical data, which cannot be disclosed or shared with third party, because it can harm individuals and organizations. The challenge is how to preserve privacy and usefulness of data. Privacy preserving classification addresses this problem. Collaborative models are constructed over networks without violating the data owners' privacy. In this article, the authors address two problems: privacy records deduplication of the same records and privacy-preserving classification. They propose a randomized hash technic for deduplication and an enhanced privacy preserving classification of biomedical data over horizontally distributed data based on two homomorphic encryptions. No private, intermediate or final results are disclosed. Experimentations show that their solution is efficient and secure without loss of accuracy.


2020 ◽  
Vol 14 (1) ◽  
pp. 34
Author(s):  
Nina Sulistiyowati ◽  
Mohamad Jajuli

Classification of data with unbalanced classes is a major problem in the field of machine learning and data mining. If working on unbalanced data, almost all classification algorithms will produce much higher accuracy for majority classes than minority classes. This research will implement the Synthetic Minority Over-sampling Technique (SMOTE) method to overcome unbalanced data on credit customer data in Rawamerta teacher cooperatives. The research methodology uses SEMMA with the stages of research Sample, Explore, Modify, Model, and Asses. The Sample Phase was conducted to choose the data of the Rawamerta Teachers Cooperative credit customers for 2015-2017 with a total of 878 data with the attributes used namely income, total deposits, loan amount, duration of installments, services, installments, and credit status. The Explore phase analyzes current classes which are categorized as majority classes because there are 813 data, while traffic classes can be categorized as minority classes because there are 65 data. The data shows an imbalance of data between the two classes. The Modify stages perform the 500% SMOTE process. The Model Stage classifies using Na�ve Bayes. Na�ve Bayes modeling with SMOTE produced 1131 successfully classified data correctly and 72 data were not classified correctly while without SMOTE resulted in 818 data was classified correctly and 60 data were not classified correctly.Keywords: Na�ve Bayes, SMOTE, unbalanced data


Author(s):  
Amanah Saeroni ◽  
Memi Nor Hayati ◽  
Rito Goejantoro

Classification is a technique to form a model of data that is already known to its classification group. The model that was formed will be used to classify new objects. The K-Nearest Neighbor (K-NN) algorithm is a method for classifying new objects based on their K nearest neighbor. Fisher discriminant analysis is a multivariate technique for separating objects in different groups to form a discriminant function for allocate new objects in groups. This research has a goal to determine the results of classifying customer premium payment status using the K-NN method and Fisher discriminant analysis and comparing the accuracy of the K-NN method classification and Fisher discriminant analysis on the insurance customer premium payment status. The data used is the insurance customer data of PT. Prudential Life Samarinda in 2019 with current premium payment status or non-current premium payment status and four independent variables are age, duration of premium payment, income and premium payment amount. The results of the comparative measurement of accuracy from the two analyzes show that the K-NN method has a higher level of accuracy than Fisher discriminant analysis for the classification of insurance customers premium payment status. The results of misclassification using the APER (Apparent Error Rate) in K-NN method is 15% while in Fisher discriminant analysis is 30%.


Author(s):  
Boudheb Tarik ◽  
Elberrichi Zakaria

Classifying data is to automatically assign predefined classes to data. It is one of the main applications of data mining. Having complete access to all data is critical for building accurate models. Data can be highly sensitive, such as biomedical data, which cannot be disclosed or shared with third party, because it can harm individuals and organizations. The challenge is how to preserve privacy and usefulness of data. Privacy preserving classification addresses this problem. Collaborative models are constructed over networks without violating the data owners' privacy. In this article, the authors address two problems: privacy records deduplication of the same records and privacy-preserving classification. They propose a randomized hash technic for deduplication and an enhanced privacy preserving classification of biomedical data over horizontally distributed data based on two homomorphic encryptions. No private, intermediate or final results are disclosed. Experimentations show that their solution is efficient and secure without loss of accuracy.


Sign in / Sign up

Export Citation Format

Share Document