Data Classification Using Decision Trees J48 Algorithm for Text Mining of Business Data

Author(s):  
Asif Yaseen

The business industry is generating a lot of data on daily business deals and financial transactions. These businesses are generating intensive-data like they need customer satisfaction on top priority, fulfilling their needs, etc. In every step, Data is being produced. This Data has a great value that is hidden from regular users. Data analytics is used to unhide those values. In our project, we are using a business-related dataset that contains strings and their class (0 or 1). 0 or 1 denotes the positive or negative string labels. To analyze this data, we are using a decision tree classification algorithm (J48 exceptionally) to perform text mining (classification) on our target dataset. Text mining comes under supervised learning (type). In-text mining, generally, we use two datasets. One is used to train the model, and the second dataset is used to predict the missing class labels in the second dataset based on this training model generated using the first dataset.

Author(s):  
Karen Medhat ◽  
Rabie A. Ramadan ◽  
Ihab Talkhan

This chapter introduces two different algorithms to detect intrusions in mission critical communication systems to guarantee their security. The first algorithm is a classification algorithm which applies the concept of supervised learning. The second algorithm is a clustering algorithm which applies the concept of unsupervised learning. The algorithms detect intrusions using a set of detection rules that are structured in the form of decision trees. The algorithms are described in details and their results on well-known dataset are introduced. An enhancement for the J48algorithm is also introduced, where the decision tree for the algorithm is changed to a binary tree. The change enhances the complexity to reach a decision. The chapter includes a brief introduction about the security in Mission critical systems and the reason behind securing such systems. It introduces different methodologies that were introduced to detect intrusions in wireless communications.


Author(s):  
Tsehay Admassu Assegie ◽  
Pramod Sekharan Nair

Handwritten digits recognition is an area of machine learning, in which a machine is trained to identify handwritten digits. One method of achieving this is with decision tree classification model. A decision tree classification is a machine learning approach that uses the predefined labels from the past known sets to determine or predict the classes of the future data sets where the class labels are unknown. In this paper we have used the standard kaggle digits dataset for recognition of handwritten digits using a decision tree classification approach. And we have evaluated the accuracy of the model against each digit from 0 to 9.


Author(s):  
Ricardo Timarán Pereira

Resumen La clasificación basada en árboles de decisión es el modelo más utilizado y popular por su simplicidad y facilidad para su entendimiento. El cálculo del valor de la métrica que permite seleccionar, en cada nodo, el atributo que tenga una mayor potencia para clasificar sobre el conjunto de valores del atributo clase, es el proceso más costoso del algoritmo utilizado. Para calcular esta métrica, no se necesitan los datos, sino las estadísticas acerca del número de registros en los cuales se combinan los atributos condición con el atributo clase. Entre los algoritmos de clasificación por árboles de decisión se cuentan ID-3, C4.5, SPRINT y SLIQ. Sin embargo, ninguno de estos algoritmos se basan en operadores algebraicos relacionales y se implementa con primitivas SQL. En este artículo se presenta Mate-tree, un algoritmo para la tarea de minería de datos clasificación basado en los operadores algebraicos relacionales Mate, Entro, Gain y Describe Classifier, implementados en la cláusula SQL Select con las primitivas SQL Mate by, Entro(), Gain() y Describe Classification Rules, los cuales facilitan el cálculo de Ganancia de Información, la construcción del árbol de decisión y el acoplamiento fuerte de este algoritmo con un SGBD. Palabras ClavesÁrboles de Decisión, Minería de Datos, Operadores Algebraicos Relacionales, Primitivas SQL, Tarea de Clasificación.  Abstract Decision tree classification is the most used and popular model, because it is simple and easy to understand. The calculation of the value of the measure that allows selecting, in each node, the attribute with the highest power to classify on the set of values of the class attribute, is the most expensive process in the used algorithm. To compute this measure, the data are not needed, but the statistics about the number of records in which combine the test attributes with the class attribute. Among the classification algorithms by decision trees are ID-3, C4.5, SPRINT and SLIQ. However, none of these algorithms are based on relational algebraic operators and are implemented with SQL primitives. In this paper Mate-tree, an algorithm for the classification data mining task based on the relational algebraic operators Mate, Entro, Gain and Describe Classifier, is presented. They were implemented in the SQL Select clause with SQL primitives Mate by, Entro(), Gain() y Describe Classification Rules. They facilitate the calculation of the Information Gain, the construction of the decision tree and the tight coupled of this algorithm with a DBMS.KeywordsDecision Trees, Data Mining, Relational Algebraic Operators, SQL Primitives, Classification Task. 


2019 ◽  
Vol 2 (2) ◽  
pp. 119-134
Author(s):  
Saiful Rizal ◽  
Candra Kurniawan ◽  
Fahrur Rozi

Pelabuhan Batu Ampar merupakan pelabuhan barang terbesar di Kota Batam yang memiliki lalu lintas tertinggi baik untuk kegiatan ekspor maupun kegiatan impor. Waktu tunggu (dwelling time) masih menjadi masalah dalam layanan pelabuhan. Waktu tunggu merupakan salah satu indikator efisiensi pengelolaan pelabuhan. Rata-rata waktu tunggu pelabuhan Batu Ampar untuk kegiatan bongkar pada triwulan I-2015 adalah 7 hari, sedangkan kegiatan muatnya adalah 5 hari. Hal ini yang menjadikan kinerja pelabuhan Batu Ampar masih banyak dikeluhkan, sehingga berakibat banyaknya antrian kapal. Untuk itu, perlu dilakukan analisis guna menghasilkan model yang bisa memberikan gambaran waktu tunggu di pelabuhan dan melakukan evaluasi terhadap model analitik yang telah dibangun. Analisa data sekunder pelabuhan Batu Ampar menggunakan data mining. Metode data mining yang dilakukan menggunakan algoritma supervised learning, yaitu multiple regression dan decision trees. Tujuan umum dari multiple regression adalah untuk mempelajari lebih lanjut tentang hubungan antara beberapa variabel independen atau prediktor dan variabel dependen atau kriteria. Decision trees yang digunakan untuk eksplorasi data pelabuhan ini menggunakan klasifikasi. Klasifikasi decision trees dapat menemukan apakah data mengandung kelas objek yang dipisahkan dengan baik, sehingga kelas dapat diinterpretasikan secara bermakna dalam konteks teori substantif. Dua metode evaluasi model dilakukan untuk dua hasil permodelan yang dibangun. Uji Analysis of Variance (Anova) digunakan untuk evaluasi model multiple regression, sedangkan untuk model decision tree dievaluasi dengan confussion matrix. Hasil analisis data menunjukkan lamanya waktu kapal melakukan bongkar/muat dipengaruhi oleh tiga variabel yaitu jenis ekspedisi, bendera, dan volume. Dengan menggunakan regresi berganda maka dihasilkan model prediksi waktu sandar kapal. Hasil evaluasi model menunjukkan bahwa model yang dibuat signifikan. Dengan tingkat kepercayaan 95% model prediktif yang dibuat akan merepresentasikan nilai sebenarnya. Untuk decision tree, evaluasi menunjukkan model yang dibuat sudah fit, dengan presisi 84,50%.


Author(s):  
Karen Medhat ◽  
Rabie A. Ramadan ◽  
Ihab Talkhan

This chapter introduces two different algorithms to detect intrusions in mission critical communication systems to guarantee their security. The first algorithm is a classification algorithm which applies the concept of supervised learning. The second algorithm is a clustering algorithm which applies the concept of unsupervised learning. The algorithms detect intrusions using a set of detection rules that are structured in the form of decision trees. The algorithms are described in details and their results on well-known dataset are introduced. An enhancement for the J48algorithm is also introduced, where the decision tree for the algorithm is changed to a binary tree. The change enhances the complexity to reach a decision. The chapter includes a brief introduction about the security in Mission critical systems and the reason behind securing such systems. It introduces different methodologies that were introduced to detect intrusions in wireless communications.


2020 ◽  
Vol 9 (6) ◽  
pp. 2518-2525
Author(s):  
Eddie Bouy B. Palad ◽  
Mary Jane F. Burden ◽  
Christian Ray Dela Torre ◽  
Rachelle Bea C. Uy

Text mining is one way of extracting knowledge and finding out hidden relationships among data using artificial intelligence methods. Surely, taking advantage of different techniques has been highlighted in previous researches however, the lack of literature focusing on cybercrimes implies the lack of utilization of data mining in facilitating cybercrime investigations in the Philippines. This study therefore classifies computer fraud or online scam data coming from Police incident reports as well as narratives of scam victims as a continuation of a prior study. The dataset consists mainly of unstructured data of 49,822 mainly Filipino words. Further, five (5) decision tree algorithms namely, J48, Hoeffding Tree, Decision Stump, REPTree, and Random Forest were employed and compared in terms of their performance and prediction accuracy. The results show that J48 achieves the highest accuracy and the lowest error rate among other classifiers. Results were validated by Police investigators where J48 was likewise preferred as a potential tool to apply in cybercrime investigations. This indicates the importance of text mining in the field of cybercrime investigation domains in the country. Further work can be carried out in the future using different and more inclusive cybercrime datasets and other classification techniques in Weka or any other data mining tool.


1986 ◽  
Vol 25 (04) ◽  
pp. 207-214 ◽  
Author(s):  
P. Glasziou

SummaryThe development of investigative strategies by decision analysis has been achieved by explicitly drawing the decision tree, either by hand or on computer. This paper discusses the feasibility of automatically generating and analysing decision trees from a description of the investigations and the treatment problem. The investigation of cholestatic jaundice is used to illustrate the technique.Methods to decrease the number of calculations required are presented. It is shown that this method makes practical the simultaneous study of at least half a dozen investigations. However, some new problems arise due to the possible complexity of the resulting optimal strategy. If protocol errors and delays due to testing are considered, simpler strategies become desirable. Generation and assessment of these simpler strategies are discussed with examples.


Sign in / Sign up

Export Citation Format

Share Document