Instability of decision tree classification algorithms

Resumen La clasificación basada en árboles de decisión es el modelo más utilizado y popular por su simplicidad y facilidad para su entendimiento. El cálculo del valor de la métrica que permite seleccionar, en cada nodo, el atributo que tenga una mayor potencia para clasificar sobre el conjunto de valores del atributo clase, es el proceso más costoso del algoritmo utilizado. Para calcular esta métrica, no se necesitan los datos, sino las estadísticas acerca del número de registros en los cuales se combinan los atributos condición con el atributo clase. Entre los algoritmos de clasificación por árboles de decisión se cuentan ID-3, C4.5, SPRINT y SLIQ. Sin embargo, ninguno de estos algoritmos se basan en operadores algebraicos relacionales y se implementa con primitivas SQL. En este artículo se presenta Mate-tree, un algoritmo para la tarea de minería de datos clasificación basado en los operadores algebraicos relacionales Mate, Entro, Gain y Describe Classifier, implementados en la cláusula SQL Select con las primitivas SQL Mate by, Entro(), Gain() y Describe Classification Rules, los cuales facilitan el cálculo de Ganancia de Información, la construcción del árbol de decisión y el acoplamiento fuerte de este algoritmo con un SGBD. Palabras ClavesÁrboles de Decisión, Minería de Datos, Operadores Algebraicos Relacionales, Primitivas SQL, Tarea de Clasificación. Abstract Decision tree classification is the most used and popular model, because it is simple and easy to understand. The calculation of the value of the measure that allows selecting, in each node, the attribute with the highest power to classify on the set of values of the class attribute, is the most expensive process in the used algorithm. To compute this measure, the data are not needed, but the statistics about the number of records in which combine the test attributes with the class attribute. Among the classification algorithms by decision trees are ID-3, C4.5, SPRINT and SLIQ. However, none of these algorithms are based on relational algebraic operators and are implemented with SQL primitives. In this paper Mate-tree, an algorithm for the classification data mining task based on the relational algebraic operators Mate, Entro, Gain and Describe Classifier, is presented. They were implemented in the SQL Select clause with SQL primitives Mate by, Entro(), Gain() y Describe Classification Rules. They facilitate the calculation of the Information Gain, the construction of the decision tree and the tight coupled of this algorithm with a DBMS.KeywordsDecision Trees, Data Mining, Relational Algebraic Operators, SQL Primitives, Classification Task.

Download Full-text

Liver disorder diagnosis using linear, nonlinear and decision tree classification algorithms

International Journal of Engineering and Technology ◽

10.21817/ijet/2016/v8i5/160805424 ◽

2016 ◽

Vol 8 (5) ◽

pp. 2059-2069 ◽

Cited By ~ 3

Author(s):

Aman Singh ◽

Babita Pandey

Keyword(s):

Decision Tree ◽

Liver Disorder ◽

Classification Algorithms ◽

Decision Tree Classification

Download Full-text

Comparative Analysis of Various Decision Tree Classification Algorithms using WEKA

International Journal on Recent and Innovation Trends in Computing and Communication ◽

10.17762/ijritcc2321-8169.150254 ◽

2015 ◽

Vol 3 (2) ◽

pp. 684-690 ◽

Cited By ~ 1

Author(s):

Priyanka Sharma ◽

Keyword(s):

Comparative Analysis ◽

Decision Tree ◽

Classification Algorithms ◽

Decision Tree Classification

Download Full-text

Application of KNN and Decision Tree Classification Algorithms in the Prediction of Education Success from the Edu720 Platform

2019 4th International Conference on Smart and Sustainable Technologies (SpliTech) ◽

10.23919/splitech.2019.8783102 ◽

2019 ◽

Author(s):

Omar Dervisevic ◽

Emir Zunic ◽

Dzenana Eonko ◽

Emir Buza

Keyword(s):

Decision Tree ◽

Classification Algorithms ◽

Decision Tree Classification

Download Full-text

Parallel formulations of decision-tree classification algorithms

Proceedings. 1998 International Conference on Parallel Processing (Cat. No.98EX205) ◽

10.1109/icpp.1998.708491 ◽

2002 ◽

Cited By ~ 11

Author(s):

A. Srivastava ◽

Eui-Hong Sam Han ◽

V. Singh ◽

V. Kumar

Keyword(s):

Decision Tree ◽

Classification Algorithms ◽

Decision Tree Classification

Download Full-text

Parallel Formulations of Decision-Tree Classification Algorithms

High Performance Data Mining ◽

10.1007/0-306-47011-x_2 ◽

2006 ◽

pp. 237-261 ◽

Cited By ~ 11

Author(s):

Anurag Srivastava ◽

Eui-Hong Han ◽

Vipin Kumar ◽

Vineet Singh

Keyword(s):

Decision Tree ◽

Classification Algorithms ◽

Decision Tree Classification

Download Full-text

Decision tree classification of digital soil, weather, crop mapping and yield prediction using linear regression with region influences.

10.31220/agrirxiv.2021.00072 ◽

2021 ◽

Author(s):

Anna Rini ◽

N. Hemalatha ◽

Raji Sukumar

Keyword(s):

Linear Regression ◽

Decision Tree ◽

Classification Algorithms ◽

Yield Prediction ◽

Digital Map ◽

Decision Tree Classification ◽

Regression Algorithms ◽

Crop Mapping ◽

Classification And Regression

Abstract This project deals with the study of soil properties, crop and the regional influences along with their dependencies which would be further used for a digital map. Both classification and regression algorithms were carried out and a decision tree as well as a decision regressor tree was plotted to finalise the results. Out of the 6 classification algorithms applied decision tree gave the highest accuracy of 95.24% and linear regression gave the best accurate results of 100% among the 3 regression algorithms.

Download Full-text

A Comparative Study on Serial Decision Tree Classification Algorithms in Text Mining

International Journal of Intelligent Computing Research ◽

10.20533/ijicr.2042.4655.2016.0093 ◽

2016 ◽

Vol 7 (4) ◽

Author(s):

Khaled M. Almunirawi ◽

Ashraf Y. A. Maghari

Keyword(s):

Text Mining ◽

Decision Tree ◽

Comparative Study ◽

Classification Algorithms ◽

Decision Tree Classification

Download Full-text