A Comparative Study on Serial Decision Tree Classification Algorithms in Text Mining

Resumen La clasificación basada en árboles de decisión es el modelo más utilizado y popular por su simplicidad y facilidad para su entendimiento. El cálculo del valor de la métrica que permite seleccionar, en cada nodo, el atributo que tenga una mayor potencia para clasificar sobre el conjunto de valores del atributo clase, es el proceso más costoso del algoritmo utilizado. Para calcular esta métrica, no se necesitan los datos, sino las estadísticas acerca del número de registros en los cuales se combinan los atributos condición con el atributo clase. Entre los algoritmos de clasificación por árboles de decisión se cuentan ID-3, C4.5, SPRINT y SLIQ. Sin embargo, ninguno de estos algoritmos se basan en operadores algebraicos relacionales y se implementa con primitivas SQL. En este artículo se presenta Mate-tree, un algoritmo para la tarea de minería de datos clasificación basado en los operadores algebraicos relacionales Mate, Entro, Gain y Describe Classifier, implementados en la cláusula SQL Select con las primitivas SQL Mate by, Entro(), Gain() y Describe Classification Rules, los cuales facilitan el cálculo de Ganancia de Información, la construcción del árbol de decisión y el acoplamiento fuerte de este algoritmo con un SGBD. Palabras ClavesÁrboles de Decisión, Minería de Datos, Operadores Algebraicos Relacionales, Primitivas SQL, Tarea de Clasificación. Abstract Decision tree classification is the most used and popular model, because it is simple and easy to understand. The calculation of the value of the measure that allows selecting, in each node, the attribute with the highest power to classify on the set of values of the class attribute, is the most expensive process in the used algorithm. To compute this measure, the data are not needed, but the statistics about the number of records in which combine the test attributes with the class attribute. Among the classification algorithms by decision trees are ID-3, C4.5, SPRINT and SLIQ. However, none of these algorithms are based on relational algebraic operators and are implemented with SQL primitives. In this paper Mate-tree, an algorithm for the classification data mining task based on the relational algebraic operators Mate, Entro, Gain and Describe Classifier, is presented. They were implemented in the SQL Select clause with SQL primitives Mate by, Entro(), Gain() y Describe Classification Rules. They facilitate the calculation of the Information Gain, the construction of the decision tree and the tight coupled of this algorithm with a DBMS.KeywordsDecision Trees, Data Mining, Relational Algebraic Operators, SQL Primitives, Classification Task.

Download Full-text

Liver disorder diagnosis using linear, nonlinear and decision tree classification algorithms

International Journal of Engineering and Technology ◽

10.21817/ijet/2016/v8i5/160805424 ◽

2016 ◽

Vol 8 (5) ◽

pp. 2059-2069 ◽

Cited By ~ 3

Author(s):

Aman Singh ◽

Babita Pandey

Keyword(s):

Decision Tree ◽

Liver Disorder ◽

Classification Algorithms ◽

Decision Tree Classification

Download Full-text

Comparative Study of K-NN, Naive Bayes and Decision Tree Classification Techniques

International Journal of Science and Research (IJSR) ◽

10.21275/v5i1.nov153131 ◽

2016 ◽

Vol 5 (1) ◽

pp. 1842-1845 ◽

Cited By ~ 14

Keyword(s):

Decision Tree ◽

Comparative Study ◽

Naive Bayes ◽

Naïve Bayes ◽

Classification Techniques ◽

Decision Tree Classification

Download Full-text

Comparative Analysis of Various Decision Tree Classification Algorithms using WEKA

International Journal on Recent and Innovation Trends in Computing and Communication ◽

10.17762/ijritcc2321-8169.150254 ◽

2015 ◽

Vol 3 (2) ◽

pp. 684-690 ◽

Cited By ~ 1

Author(s):

Priyanka Sharma ◽

Keyword(s):

Comparative Analysis ◽

Decision Tree ◽

Classification Algorithms ◽

Decision Tree Classification

Download Full-text

Application of KNN and Decision Tree Classification Algorithms in the Prediction of Education Success from the Edu720 Platform

2019 4th International Conference on Smart and Sustainable Technologies (SpliTech) ◽

10.23919/splitech.2019.8783102 ◽

2019 ◽

Author(s):

Omar Dervisevic ◽

Emir Zunic ◽

Dzenana Eonko ◽

Emir Buza

Keyword(s):

Decision Tree ◽

Classification Algorithms ◽

Decision Tree Classification

Download Full-text

Performance evaluation of decision tree classification algorithms using fraud datasets

Bulletin of Electrical Engineering and Informatics ◽

10.11591/eei.v9i6.2630 ◽

2020 ◽

Vol 9 (6) ◽

pp. 2518-2525

Author(s):

Eddie Bouy B. Palad ◽

Mary Jane F. Burden ◽

Christian Ray Dela Torre ◽

Rachelle Bea C. Uy

Keyword(s):

Data Mining ◽

Text Mining ◽

Decision Tree ◽

The Philippines ◽

Decision Tree Classification ◽

Hoeffding Tree ◽

Tree Algorithms ◽

Incident Reports ◽

Mining Tool ◽

Potential Tool

Text mining is one way of extracting knowledge and finding out hidden relationships among data using artificial intelligence methods. Surely, taking advantage of different techniques has been highlighted in previous researches however, the lack of literature focusing on cybercrimes implies the lack of utilization of data mining in facilitating cybercrime investigations in the Philippines. This study therefore classifies computer fraud or online scam data coming from Police incident reports as well as narratives of scam victims as a continuation of a prior study. The dataset consists mainly of unstructured data of 49,822 mainly Filipino words. Further, five (5) decision tree algorithms namely, J48, Hoeffding Tree, Decision Stump, REPTree, and Random Forest were employed and compared in terms of their performance and prediction accuracy. The results show that J48 achieves the highest accuracy and the lowest error rate among other classifiers. Results were validated by Police investigators where J48 was likewise preferred as a potential tool to apply in cybercrime investigations. This indicates the importance of text mining in the field of cybercrime investigation domains in the country. Further work can be carried out in the future using different and more inclusive cybercrime datasets and other classification techniques in Weka or any other data mining tool.

Download Full-text

Parallel formulations of decision-tree classification algorithms

Proceedings. 1998 International Conference on Parallel Processing (Cat. No.98EX205) ◽

10.1109/icpp.1998.708491 ◽

2002 ◽

Cited By ~ 11

Author(s):

A. Srivastava ◽

Eui-Hong Sam Han ◽

V. Singh ◽

V. Kumar

Keyword(s):

Decision Tree ◽

Classification Algorithms ◽

Decision Tree Classification

Download Full-text