scholarly journals Privacy Preservation using (L, D) Inference Model Based on Dependency Identification Information Gain

The improvement of an information processing and Memory capacity, the vast amount of data is collected for various data analyses purposes. Data mining techniques are used to get knowledgeable information. The process of extraction of data by using data mining techniques the data get discovered publically and this leads to breaches of specific privacy data. Privacypreserving data mining is used to provide to protection of sensitive information from unwanted or unsanctioned disclosure. In this paper, we analysis the problem of discovering similarity checks for functional dependencies from a given dataset such that application of algorithm (l, d) inference with generalization can anonymised the micro data without loss in utility. [8] This work has presented Functional dependency based perturbation approach which hides sensitive information from the user, by applying (l, d) inference model on the dependency attributes based on Information Gain. This approach works on both categorical and numerical attributes. The perturbed data set does not affects the original dataset it maintains the same or very comparable patterns as the original data set. Hence the utility of the application is always high, when compared to other data mining techniques. The accuracy of the original and perturbed datasets is compared and analysed using tools, data mining classification algorithm.

2014 ◽  
Vol 5 (3) ◽  
pp. 11-28
Author(s):  
Ljiljana Kašćelan ◽  
Vladimir Kašćelan ◽  
Milijana Novović-Burić

This paper has proposed a data mining approach for risk assessment in car insurance. Standard methods imply classification of policies to great number of tariff classes and assessment of risk on basis of them. With application of data mining techniques, it is possible to get functional dependencies between the level of risk and risk factors as well as better results in predictions. On the case study data it has been proved that data mining techniques can, with better accuracy than the standard methods, predict claim sizes and occurrence of claims, and this represents the basis for calculation of net risk premium and risk classification. This paper, also, discusses advantages of data mining methods compared to standard methods for risk assessment in car insurance, as well as the specificities of the obtained results due to small insurance market, such is the one in Montenegro.


2020 ◽  
Author(s):  
Daniela De Souza Gomes ◽  
Marcos Henrique Fonseca Ribeiro ◽  
Giovanni Ventorim Comarela ◽  
Gabriel Philippe Pereira

High failure rates are a worrying and relevant problem in Brazilian universities. From a data set of student transcripts, we performed a study case for both general and Computer Science contexts, in which Data Mining Techniques were used to find patterns concerning failures. The knowledge acquired can be used for better educational administration and also build intelligent systems to support students’ decision making.


Big Data ◽  
2016 ◽  
pp. 2028-2046
Author(s):  
Ljiljana Kašćelan ◽  
Vladimir Kašćelan ◽  
Milijana Novović-Burić

This paper has proposed a data mining approach for risk assessment in car insurance. Standard methods imply classification of policies to great number of tariff classes and assessment of risk on basis of them. With application of data mining techniques, it is possible to get functional dependencies between the level of risk and risk factors as well as better results in predictions. On the case study data it has been proved that data mining techniques can, with better accuracy than the standard methods, predict claim sizes and occurrence of claims, and this represents the basis for calculation of net risk premium and risk classification. This paper, also, discusses advantages of data mining methods compared to standard methods for risk assessment in car insurance, as well as the specificities of the obtained results due to small insurance market, such is the one in Montenegro.


Author(s):  
K. Abumani ◽  
R. Nedunchezhian

Data mining techniques have been widely used for extracting non-trivial information from massive amounts of data. They help in strategic decision-making as well as many more applications. However, data mining also has a few demerits apart from its usefulness. Sensitive information contained in the database may be brought out by the data mining tools. Different approaches are being utilized to hide the sensitive information. The proposed work in this article applies a novel method to access the generating transactions with minimum effort from the transactional database. It helps in reducing the time complexity of any hiding algorithm. The theoretical and empirical analysis of the algorithm shows that hiding of data using this proposed work performs association rule hiding quicker than other algorithms.


2022 ◽  
pp. 154-178
Author(s):  
Siddhartha Kumar Arjaria ◽  
Vikas Raj ◽  
Sunil Kumar ◽  
Priyanshu Shrivastava ◽  
Monu Kumar ◽  
...  

Skin disease rates have been increasing over the past few decades. It has led to both fatal and non-fatal disabilities all around the world, especially in those areas where medical resources are not good enough. Early diagnosis of skin diseases increases the chances of cure significantly. Therefore, this work is comparing six machine learning algorithms, namely KNN, random forest, neural network, naïve bayes, logistic regression, and SVM, for the prediction of the skin diseases. The information gain, gain ratio, gini decrease, chi-square, and relieff are used to rank the features. This work comprises the introduction, literature review, and proposed methodology parts. In this research paper, a new method of analyzing skin disease has been proposed in which six different data mining techniques are used to develop an ensemble method that integrates all the six data mining techniques as a single one. The ensemble method used on the dermatology dataset gives improved result with 94% accuracy in comparison to other classifier algorithms and hence is more effective in this area.


Author(s):  
Shyue-Liang Wang ◽  
Ju-Wen Shen ◽  
Tuzng-Pei Hong

Mining functional dependencies (FDs) from databases has been identified as an important database analysis technique. It has received considerable research interest in recent years. However, most current data mining techniques for determining functional dependencies deal only with crisp databases. Although various forms of fuzzy functional dependencies (FFDs) have been proposed for fuzzy databases, they emphasized conceptual viewpoints and only a few mining algorithms are given. In this research, we propose methods to validate and incrementally search for FFDs from similarity-based fuzzy relational databases. For a given pair of attributes, the validation of FFDs is based on fuzzy projection and fuzzy selection operations. In addition, the property that FFDs are monotonic in the sense that r1 ? r2 implies FDa(r1) ? FDa(r2) is shown. An incremental search algorithm for FFDs based on this property is then presented. Experimental results showing the behavior of the search algorithm are discussed.


2018 ◽  
Vol 5 (3) ◽  
pp. 1-20 ◽  
Author(s):  
Sharmila Subudhi ◽  
Suvasini Panigrahi

This article presents a novel approach for fraud detection in automobile insurance claims by applying various data mining techniques. Initially, the most relevant attributes are chosen from the original dataset by using an evolutionary algorithm based feature selection method. A test set is then extracted from the selected attribute set and the remaining dataset is subjected to the Possibilistic Fuzzy C-Means (PFCM) clustering technique for the undersampling approach. The 10-fold cross validation method is then used on the balanced dataset for training and validating a group of Weighted Extreme Learning Machine (WELM) classifiers generated from various combinations of WELM parameters. Finally, the test set is applied on the best performing model for classification purpose. The efficacy of the proposed system is illustrated by conducting several experiments on a real-world automobile insurance defraud dataset. Besides, a comparative analysis with another approach justifies the superiority of the proposed system.


2013 ◽  
Vol 5 (1) ◽  
pp. 66-83 ◽  
Author(s):  
Iman Rahimi ◽  
Reza Behmanesh ◽  
Rosnah Mohd. Yusuff

The objective of this article is an evaluation and assessment efficiency of the poultry meat farm as a case study with the new method. As it is clear poultry farm industry is one of the most important sub- sectors in comparison to other ones. The purpose of this study is the prediction and assessment efficiency of poultry farms as decision making units (DMUs). Although, several methods have been proposed for solving this problem, the authors strongly need a methodology to discriminate performance powerfully. Their methodology is comprised of data envelopment analysis and some data mining techniques same as artificial neural network (ANN), decision tree (DT), and cluster analysis (CA). As a case study, data for the analysis were collected from 22 poultry companies in Iran. Moreover, due to a small data set and because of the fact that the authors must use large data set for applying data mining techniques, they employed k-fold cross validation method to validate the authors’ model. After assessing efficiency for each DMU and clustering them, followed by applied model and after presenting decision rules, results in precise and accurate optimizing technique.


Author(s):  
Dominique Haughton ◽  
Guangying Hua ◽  
Danny Jin ◽  
John Lin ◽  
Qizhi Wei ◽  
...  

Purpose – The purpose of this paper is to propose data mining techniques to model the return on investment from various types of promotional spending to market a drug and then use the model to draw conclusions on how the pharmaceutical industry might go about allocating promotion expenditures in a more efficient manner, potentially reducing costs to the consumer. The main contributions of the paper are two-fold. First, it demonstrates how to undertake a promotion mix optimization process in the pharmaceutical context and carry it through from the beginning to the end. Second, the paper proposes using directed acyclic graphs (DAGs) to help unravel the direct and indirect effects of various promotional media on sales volume. Design/methodology/approach – A synthetic data set was constructed to prototype proposed data mining techniques and two analyses approaches were investigated. Findings – The two methods were found to yield insights into the problem of the promotion mix in the context of the healthcare industry. First, a factor analysis followed by a regression analysis and an optimization algorithm applied to the resulting equation were used. Second, DAG was used to unravel direct and indirect effects of promotional expenditures on new prescriptions. Research limitations/implications – The data are synthetic and do not incorporate any time autocorrelations. Practical implications – The promotion mix optimization process is demonstrated from the beginning to the end, and the issue of negative coefficient in promotion mix models are addressed. In addition, a method is proposed to identify direct and indirect effects on new prescriptions. Social implications – A better allocation of promotional expenditures has the potential for reducing the cost of healthcare to consumers. Originality/value – The contributions of the paper are two-fold: for the first time in the literature (to the best of the authors’ knowledge), the authors have undertaken a promotion mix optimization process and have carried it through from the beginning to the end Second, the authors propose the use of DAGs to help unravel the effects of various promotion media on sales volume, notably direct and indirect effects.


2014 ◽  
Vol 998-999 ◽  
pp. 842-845 ◽  
Author(s):  
Jia Mei Guo ◽  
Yin Xiang Pei

Association rules extraction is one of the important goals of data mining and analyzing. Aiming at the problem that information lose caused by crisp partition of numerical attribute , in this article, we put forward a fuzzy association rules mining method based on fuzzy logic. First, we use c-means clustering to generate fuzzy partitions and eliminate redundant data, and then map the original data set into fuzzy interval, in the end, we extract the fuzzy association rules on the fuzzy data set as providing the basis for proper decision-making. Results show that this method can effectively improve the efficiency of data mining and the semantic visualization and credibility of association rules.


Sign in / Sign up

Export Citation Format

Share Document