Using decision trees to summarize associative classification rules

2009 ◽  
Vol 36 (2) ◽  
pp. 2338-2351 ◽  
Author(s):  
Yen-Liang Chen ◽  
Lucas Tzu-Hsuan Hung
Entropy ◽  
2019 ◽  
Vol 21 (1) ◽  
pp. 66 ◽  
Author(s):  
Georgios Feretzakis ◽  
Dimitris Kalles ◽  
Vassilios S. Verykios

Data sharing among organizations has become an increasingly common procedure in several areas such as advertising, marketing, electronic commerce, banking, and insurance sectors. However, any organization will most likely try to keep some patterns as hidden as possible once it shares its datasets with others. This paper focuses on preserving the privacy of sensitive patterns when inducing decision trees. We adopt a record augmentation approach to hide critical classification rules in binary datasets. Such a hiding methodology is preferred over other heuristic solutions like output perturbation or cryptographic techniques, which limit the usability of the data, since the raw data itself is readily available for public use. We propose a look ahead technique using linear Diophantine equations to add the appropriate number of instances while maintaining the initial entropy of the nodes. This method can be used to hide one or more decision tree rules optimally.


Author(s):  
M. Carr ◽  
V. Ravi ◽  
G. Sridharan Reddy ◽  
D. Veranna

This paper profiles mobile banking users using machine learning techniques viz. Decision Tree, Logistic Regression, Multilayer Perceptron, and SVM to test a research model with fourteen independent variables and a dependent variable (adoption). A survey was conducted and the results were analysed using these techniques. Using Decision Trees the profile of the mobile banking adopter’s profile was identified. Comparing different machine learning techniques it was found that Decision Trees outperformed the Logistic Regression and Multilayer Perceptron and SVM. Out of all the techniques, Decision Tree is recommended for profiling studies because apart from obtaining high accurate results, it also yields ‘if–then’ classification rules. The classification rules provided here can be used to target potential customers to adopt mobile banking by offering them appropriate incentives.


2014 ◽  
Vol 2014 ◽  
pp. 1-7 ◽  
Author(s):  
Zhongmei Zhou

A good classifier can correctly predict new data for which the class label is unknown, so it is important to construct a high accuracy classifier. Hence, classification techniques are much useful in ubiquitous computing. Associative classification achieves higher classification accuracy than some traditional rule-based classification approaches. However, the approach also has two major deficiencies. First, it generates a very large number of association classification rules, especially when the minimum support is set to be low. It is difficult to select a high quality rule set for classification. Second, the accuracy of associative classification depends on the setting of the minimum support and the minimum confidence. In comparison with associative classification, some improved traditional rule-based classification approaches often produce a classification rule set that plays an important role in prediction. Thus, some improved traditional rule-based classification approaches not only achieve better efficiency than associative classification but also get higher accuracy. In this paper, we put forward a new classification approach called CMR (classification based on multiple classification rules). CMR combines the advantages of both associative classification and rule-based classification. Our experimental results show that CMR gets higher accuracy than some traditional rule-based classification methods.


Big Data is a current burning challenge for the data analytics research community. Many conventional data analytics techniques have been extended to the MapReduce framework to process Big Data. But in our literature review, we find that for the MapReduce system there is an absolute lack of rough setbased technique. To facilitate this and recognize the importance of the rule-based classification techniques, we suggest a roughset associative classification rules extraction process for the MapReduce framework. The implementation and evaluation of the Big Data Standard data set demonstrated the efficiency of our suggested approach.


Author(s):  
Ricardo Timarán Pereira

Resumen La clasificación basada en árboles de decisión es el modelo más utilizado y popular por su simplicidad y facilidad para su entendimiento. El cálculo del valor de la métrica que permite seleccionar, en cada nodo, el atributo que tenga una mayor potencia para clasificar sobre el conjunto de valores del atributo clase, es el proceso más costoso del algoritmo utilizado. Para calcular esta métrica, no se necesitan los datos, sino las estadísticas acerca del número de registros en los cuales se combinan los atributos condición con el atributo clase. Entre los algoritmos de clasificación por árboles de decisión se cuentan ID-3, C4.5, SPRINT y SLIQ. Sin embargo, ninguno de estos algoritmos se basan en operadores algebraicos relacionales y se implementa con primitivas SQL. En este artículo se presenta Mate-tree, un algoritmo para la tarea de minería de datos clasificación basado en los operadores algebraicos relacionales Mate, Entro, Gain y Describe Classifier, implementados en la cláusula SQL Select con las primitivas SQL Mate by, Entro(), Gain() y Describe Classification Rules, los cuales facilitan el cálculo de Ganancia de Información, la construcción del árbol de decisión y el acoplamiento fuerte de este algoritmo con un SGBD. Palabras ClavesÁrboles de Decisión, Minería de Datos, Operadores Algebraicos Relacionales, Primitivas SQL, Tarea de Clasificación.  Abstract Decision tree classification is the most used and popular model, because it is simple and easy to understand. The calculation of the value of the measure that allows selecting, in each node, the attribute with the highest power to classify on the set of values of the class attribute, is the most expensive process in the used algorithm. To compute this measure, the data are not needed, but the statistics about the number of records in which combine the test attributes with the class attribute. Among the classification algorithms by decision trees are ID-3, C4.5, SPRINT and SLIQ. However, none of these algorithms are based on relational algebraic operators and are implemented with SQL primitives. In this paper Mate-tree, an algorithm for the classification data mining task based on the relational algebraic operators Mate, Entro, Gain and Describe Classifier, is presented. They were implemented in the SQL Select clause with SQL primitives Mate by, Entro(), Gain() y Describe Classification Rules. They facilitate the calculation of the Information Gain, the construction of the decision tree and the tight coupled of this algorithm with a DBMS.KeywordsDecision Trees, Data Mining, Relational Algebraic Operators, SQL Primitives, Classification Task. 


Sign in / Sign up

Export Citation Format

Share Document