scholarly journals Mate-tree: un algoritmo para la tarea de clasificación basado en operadores algebraicos y primitivas SQL [Mate-tree: an Algorithm for classification task based on algebraic operators and SQL primitives]

Author(s):  
Ricardo Timarán Pereira

Resumen La clasificación basada en árboles de decisión es el modelo más utilizado y popular por su simplicidad y facilidad para su entendimiento. El cálculo del valor de la métrica que permite seleccionar, en cada nodo, el atributo que tenga una mayor potencia para clasificar sobre el conjunto de valores del atributo clase, es el proceso más costoso del algoritmo utilizado. Para calcular esta métrica, no se necesitan los datos, sino las estadísticas acerca del número de registros en los cuales se combinan los atributos condición con el atributo clase. Entre los algoritmos de clasificación por árboles de decisión se cuentan ID-3, C4.5, SPRINT y SLIQ. Sin embargo, ninguno de estos algoritmos se basan en operadores algebraicos relacionales y se implementa con primitivas SQL. En este artículo se presenta Mate-tree, un algoritmo para la tarea de minería de datos clasificación basado en los operadores algebraicos relacionales Mate, Entro, Gain y Describe Classifier, implementados en la cláusula SQL Select con las primitivas SQL Mate by, Entro(), Gain() y Describe Classification Rules, los cuales facilitan el cálculo de Ganancia de Información, la construcción del árbol de decisión y el acoplamiento fuerte de este algoritmo con un SGBD. Palabras ClavesÁrboles de Decisión, Minería de Datos, Operadores Algebraicos Relacionales, Primitivas SQL, Tarea de Clasificación.  Abstract Decision tree classification is the most used and popular model, because it is simple and easy to understand. The calculation of the value of the measure that allows selecting, in each node, the attribute with the highest power to classify on the set of values of the class attribute, is the most expensive process in the used algorithm. To compute this measure, the data are not needed, but the statistics about the number of records in which combine the test attributes with the class attribute. Among the classification algorithms by decision trees are ID-3, C4.5, SPRINT and SLIQ. However, none of these algorithms are based on relational algebraic operators and are implemented with SQL primitives. In this paper Mate-tree, an algorithm for the classification data mining task based on the relational algebraic operators Mate, Entro, Gain and Describe Classifier, is presented. They were implemented in the SQL Select clause with SQL primitives Mate by, Entro(), Gain() y Describe Classification Rules. They facilitate the calculation of the Information Gain, the construction of the decision tree and the tight coupled of this algorithm with a DBMS.KeywordsDecision Trees, Data Mining, Relational Algebraic Operators, SQL Primitives, Classification Task. 

Author(s):  
Y. Fakir ◽  
M. Azalmad ◽  
R. Elaychi

Data Mining is a process of exploring against large data to find patterns in decision-making. One of the techniques in decision-making is classification. Data classification is a form of data analysis used to extract models describing important data classes. There are many classification algorithms. Each classifier encompasses some algorithms in order to classify object into predefined classes. Decision Tree is one such important technique, which builds a tree structure by incrementally breaking down the datasets in smaller subsets. Decision Trees can be implemented by using popular algorithms such as ID3, C4.5 and CART etc. The present study considers ID3 and C4.5 algorithms to build a decision tree by using the “entropy” and “information gain” measures that are the basics components behind the construction of a classifier model


2021 ◽  
pp. 1-10
Author(s):  
Chao Dong ◽  
Yan Guo

The wide application of artificial intelligence technology in various fields has accelerated the pace of people exploring the hidden information behind large amounts of data. People hope to use data mining methods to conduct effective research on higher education management, and decision tree classification algorithm as a data analysis method in data mining technology, high-precision classification accuracy, intuitive decision results, and high generalization ability make it become a more ideal method of higher education management. Aiming at the sensitivity of data processing and decision tree classification to noisy data, this paper proposes corresponding improvements, and proposes a variable precision rough set attribute selection standard based on scale function, which considers both the weighted approximation accuracy and attribute value of the attribute. The number improves the anti-interference ability of noise data, reduces the bias in attribute selection, and improves the classification accuracy. At the same time, the suppression factor threshold, support and confidence are introduced in the tree pre-pruning process, which simplifies the tree structure. The comparative experiments on standard data sets show that the improved algorithm proposed in this paper is better than other decision tree algorithms and can effectively realize the differentiated classification of higher education management.


2018 ◽  
Vol 2 (2) ◽  
pp. 167
Author(s):  
Marko Ferdian Salim ◽  
Sugeng Sugeng

Latar Belakang: Diabetes mellitus adalah penyakit kronis yang mempengaruhi beban ekonomi dan sosial secara luas. Data pasien dicatat melalui sistem rekam medis pasien yang tersimpan dalam database sistem informasi rumah sakit, data yang tercatat belum dianalisis secara efektif untuk menghasilkan informasi yang berharga. Teknik data mining bisa digunakan untuk menghasilkan informasi yang berharga tersebut.Tujuan: Mengidentifikasi karakteristik pasien Diabetes mellitus, kecenderungan dan tipe Diabetes melitus melalui penerapan teknik data mining di RSUP Dr. Sardjito Yogyakarta.Metode: Penelitian ini merupakan penelitian deskriptif observasional dengan rancangan cross sectional. Teknik pengumpulan data dilakukan secara retrospektif melalui observasi dan studi dokumentasi rekam medis elektronik di RSUP Dr. Sardjito Yogyakarta. Data yang terkumpul kemudian dilakukan analisis dengan menggunakan aplikasi Weka.Hasil: Pasien Diabetes mellitus di RSUP Dr. Sardjito tahun 2011-2016 berjumlah 1.554 orang dengan tren yang cenderung menurun. Pasien paling banyak berusia 56 - 63 tahun (27,86%). Kejadian Diabetes mellitus didominasi oleh Diabetes mellitus tipe 2 dengan komplikasi tertinggi adalah hipertensi, nefropati, dan neuropati. Dengan menggunakan teknik data mining dengan algoritma decision tree J48 (akurasi 88.42%) untuk analisis rekam medis pasien telah menghasilkan beberapa rule.Kesimpulan: Teknik klasifikasi data mining (akurasi 88.42%) dan decision trees telah berhasil mengidentifikasi karakteristik pasien dan menemukan beberapa rules yang dapat digunakan pihak rumah sakit dalam pengambilan keputusan mengenai penyakit Diabetes mellitus.


Author(s):  
Malcolm J. Beynonm

The seminal work of Zadeh (1965), namely fuzzy set theory (FST), has developed into a methodology fundamental to analysis that incorporates vagueness and ambiguity. With respect to the area of data mining, it endeavours to find potentially meaningful patterns from data (Hu & Tzeng, 2003). This includes the construction of if-then decision rule systems, which attempt a level of inherent interpretability to the antecedents and consequents identified for object classification (See Breiman, 2001). Within a fuzzy environment this is extended to allow a linguistic facet to the possible interpretation, examples including mining time series data (Chiang, Chow, & Wang, 2000) and multi-objective optimisation (Ishibuchi & Yamamoto, 2004). One approach to if-then rule construction has been through the use of decision trees (Quinlan, 1986), where the path down a branch of a decision tree (through a series of nodes), is associated with a single if-then rule. A key characteristic of the traditional decision tree analysis is that the antecedents described in the nodes are crisp, where this restriction is mitigated when operating in a fuzzy environment (Crockett, Bandar, Mclean, & O’Shea, 2006). This chapter investigates the use of fuzzy decision trees as an effective tool for data mining. Pertinent to data mining and decision making, Mitra, Konwar and Pal (2002) succinctly describe a most important feature of decision trees, crisp and fuzzy, which is their capability to break down a complex decision-making process into a collection of simpler decisions and thereby, providing an easily interpretable solution.


Entropy ◽  
2019 ◽  
Vol 21 (1) ◽  
pp. 66 ◽  
Author(s):  
Georgios Feretzakis ◽  
Dimitris Kalles ◽  
Vassilios S. Verykios

Data sharing among organizations has become an increasingly common procedure in several areas such as advertising, marketing, electronic commerce, banking, and insurance sectors. However, any organization will most likely try to keep some patterns as hidden as possible once it shares its datasets with others. This paper focuses on preserving the privacy of sensitive patterns when inducing decision trees. We adopt a record augmentation approach to hide critical classification rules in binary datasets. Such a hiding methodology is preferred over other heuristic solutions like output perturbation or cryptographic techniques, which limit the usability of the data, since the raw data itself is readily available for public use. We propose a look ahead technique using linear Diophantine equations to add the appropriate number of instances while maintaining the initial entropy of the nodes. This method can be used to hide one or more decision tree rules optimally.


Author(s):  
M. Carr ◽  
V. Ravi ◽  
G. Sridharan Reddy ◽  
D. Veranna

This paper profiles mobile banking users using machine learning techniques viz. Decision Tree, Logistic Regression, Multilayer Perceptron, and SVM to test a research model with fourteen independent variables and a dependent variable (adoption). A survey was conducted and the results were analysed using these techniques. Using Decision Trees the profile of the mobile banking adopter’s profile was identified. Comparing different machine learning techniques it was found that Decision Trees outperformed the Logistic Regression and Multilayer Perceptron and SVM. Out of all the techniques, Decision Tree is recommended for profiling studies because apart from obtaining high accurate results, it also yields ‘if–then’ classification rules. The classification rules provided here can be used to target potential customers to adopt mobile banking by offering them appropriate incentives.


2021 ◽  
Vol 7 ◽  
pp. e424
Author(s):  
G Sekhar Reddy ◽  
Suneetha Chittineni

Information efficiency is gaining more importance in the development as well as application sectors of information technology. Data mining is a computer-assisted process of massive data investigation that extracts meaningful information from the datasets. The mined information is used in decision-making to understand the behavior of each attribute. Therefore, a new classification algorithm is introduced in this paper to improve information management. The classical C4.5 decision tree approach is combined with the Selfish Herd Optimization (SHO) algorithm to tune the gain of given datasets. The optimal weights for the information gain will be updated based on SHO. Further, the dataset is partitioned into two classes based on quadratic entropy calculation and information gain. Decision tree gain optimization is the main aim of our proposed C4.5-SHO method. The robustness of the proposed method is evaluated on various datasets and compared with classifiers, such as ID3 and CART. The accuracy and area under the receiver operating characteristic curve parameters are estimated and compared with existing algorithms like ant colony optimization, particle swarm optimization and cuckoo search.


Sign in / Sign up

Export Citation Format

Share Document