Performance evaluation of decision tree classification algorithms using fraud datasets

Text mining is one way of extracting knowledge and finding out hidden relationships among data using artificial intelligence methods. Surely, taking advantage of different techniques has been highlighted in previous researches however, the lack of literature focusing on cybercrimes implies the lack of utilization of data mining in facilitating cybercrime investigations in the Philippines. This study therefore classifies computer fraud or online scam data coming from Police incident reports as well as narratives of scam victims as a continuation of a prior study. The dataset consists mainly of unstructured data of 49,822 mainly Filipino words. Further, five (5) decision tree algorithms namely, J48, Hoeffding Tree, Decision Stump, REPTree, and Random Forest were employed and compared in terms of their performance and prediction accuracy. The results show that J48 achieves the highest accuracy and the lowest error rate among other classifiers. Results were validated by Police investigators where J48 was likewise preferred as a potential tool to apply in cybercrime investigations. This indicates the importance of text mining in the field of cybercrime investigation domains in the country. Further work can be carried out in the future using different and more inclusive cybercrime datasets and other classification techniques in Weka or any other data mining tool.

Download Full-text

Improved differentiation classification of variable precision artificial intelligence higher education management

Journal of Intelligent & Fuzzy Systems ◽

10.3233/jifs-219036 ◽

2021 ◽

pp. 1-10

Author(s):

Chao Dong ◽

Yan Guo

Keyword(s):

Artificial Intelligence ◽

Higher Education ◽

Data Mining ◽

Decision Tree ◽

Classification Accuracy ◽

Attribute Selection ◽

Higher Education Management ◽

Education Management ◽

Decision Tree Classification

The wide application of artificial intelligence technology in various fields has accelerated the pace of people exploring the hidden information behind large amounts of data. People hope to use data mining methods to conduct effective research on higher education management, and decision tree classification algorithm as a data analysis method in data mining technology, high-precision classification accuracy, intuitive decision results, and high generalization ability make it become a more ideal method of higher education management. Aiming at the sensitivity of data processing and decision tree classification to noisy data, this paper proposes corresponding improvements, and proposes a variable precision rough set attribute selection standard based on scale function, which considers both the weighted approximation accuracy and attribute value of the attribute. The number improves the anti-interference ability of noise data, reduces the bias in attribute selection, and improves the classification accuracy. At the same time, the suppression factor threshold, support and confidence are introduced in the tree pre-pruning process, which simplifies the tree structure. The comparative experiments on standard data sets show that the improved algorithm proposed in this paper is better than other decision tree algorithms and can effectively realize the differentiated classification of higher education management.

Download Full-text

Data Mining and Ergonomic Evaluation of Firefighter’s Motion Based on Decision Tree Classification Model

Communications in Computer and Information Science - Advanced Research on Computer Science and Information Engineering ◽

10.1007/978-3-642-21411-0_35 ◽

2011 ◽

pp. 212-217 ◽

Cited By ~ 1

Author(s):

Lifang Yang ◽

Tianjiao Zhao

Keyword(s):

Data Mining ◽

Decision Tree ◽

Classification Model ◽

Decision Tree Classification ◽

Ergonomic Evaluation

Download Full-text

An Application of Text Mining to Capture and Analyze eWOM

Advances in Marketing, Customer Relationship Management, and E-Services - Capturing, Analyzing, and Managing Word-of-Mouth in the Digital Marketplace ◽

10.4018/978-1-4666-9449-1.ch010 ◽

2016 ◽

pp. 168-186 ◽

Cited By ~ 2

Author(s):

Taşkın Dirsehan

Keyword(s):

Data Mining ◽

Text Mining ◽

Customer Relationship ◽

Competitive Advantages ◽

Strategic Decisions ◽

Data Mining Tool ◽

Mining Tool ◽

The Moment ◽

Mining Tools

Marketing concept has progressed through different phases of evolution in the past. At the moment, customer relationship management is considered as the last era of marketing development. The main purpose of this approach is to build long-term oriented profitable relationships with customers. So, companies should know better their customers. This knowledge can be created through a deeper analysis of companies' data with data mining tools. Companies which are able to use data mining tools will gain strong competitive advantages for their strategic decisions. Hotel industry is selected in this study, since it provides a warehouse of customer comments from which precious knowledge can be obtained if text mining as a data mining tool is used appropriately. Thus, this study attempts to explain the stages of text mining with the use of Rapidminer. As a result, different approaches according to the customer satisfaction/dissatisfaction are discussed to build competitive advantages.

Download Full-text

PENERAPAN ALGORITMA C4.5 UNTUK PENENTUAN KELAYAKAN PEMBERIAN KREDIT (Studi Kasus : Koperia - Koperasi Warga Komplek Gandaria)

Jurnal Algoritma, Logika dan Komputasi ◽

10.30813/j-alu.v2i1.1573 ◽

2019 ◽

Vol 2 (1) ◽

Author(s):

Teguh Budi Santoso ◽

Dela Sekardiana

Keyword(s):

Data Mining ◽

Decision Tree ◽

Classification Model ◽

Decision Tree Classification ◽

C4.5 Algorithm ◽

Credit Worthiness ◽

Loan Amount

Current credit giving in KOPERIA (Koperasi Warga Komplek Gandaria) is still based on an objective process. Difficulties in determining the feasibility of giving credit are often experienced by cooperative managers, so that problems arise in the cooperative is a default payment of credit installments of customers in KOPERIA. This study aims to form a decision tree classification model to determine the customer's credit worthiness. In this study the application of C4.5 Algorithm, based on the Sets and Attributes used in this study, namely, the amount of income divided into 2 categories> 5 million and 3-5 million, the amount of balance divided into three, namely> 3 million, 1-3 million and <1 Million, The Loan Amount is divided into three, namely 1-4 Months, 5-8 months, and 9-12 Months and Requirements with attributes of Business Capital, buying goods and others. In this study determine the appropriate root nodes, the classification results using C4.5 Algorithm shows that the accuracy of 97.5% is obtained, based on the results obtained shows that the c4.5 algorithm is suitable to be used to determine the feasibility of lending customers to KOPERIA.Keywords: Data Mining, C4.5 Algorithm, loan feasibility

Download Full-text

MEMPREDIKSI TINGKAT PEMINAT EKSTRAKURIKULER PADA SISWA SMK ANALISIS KESEHATAN ABDURRAB MENGGUNAKAN ALGORITMA C4.5 (STUDI KASUS: SMK ANALIS KESEHATAN ABDURRAB)

Rabit : Jurnal Teknologi dan Sistem Informasi Univrab ◽

10.36341/rabit.v2i2.212 ◽

2017 ◽

Vol 2 (2) ◽

pp. 220-233

Author(s):

Luluk Elvitaria

Keyword(s):

Data Mining ◽

Decision Tree ◽

Foreign Language ◽

Extracurricular Activities ◽

School Health ◽

Vocational School ◽

Foreign Languages ◽

Self Development ◽

Tree Algorithms

Extracurricular activities are additional activities in schools, where through this activity, students can add or explore the skills of students in self-development efforts. One of the extracurricular activities is foreign language extracurricular activities, covering 5 languages namely Arabic, English, German, Mandarin, Japanese. In knowing students' interest in extracurricular activities, a study was conducted on the level of interest in extracurricular activities, namely foreign languages, students at the Vocational School Health Analyst Abdurrab. In predicting the level of interest in foreign languages by the process of data mining using the C45 Algorithm method. C45 algorithm is a group of Decision Tree Algorithms. From this research, the school can find out the extent of interest in foreign languages in students and schools can increase extracurricular activities and students can develop their interest in foreign languages as they wish.

Download Full-text

Analysis of Various Decision Tree Algorithms for Classification in Data Mining

International Journal of Computer Applications ◽

10.5120/ijca2017913660 ◽

2017 ◽

Vol 163 (8) ◽

pp. 15-19 ◽

Cited By ~ 27

Author(s):

Bhumika Gupta ◽

Aditya Rawat ◽

Akshay Jain ◽

Arpit Arora ◽

Naresh Dhami

Keyword(s):

Data Mining ◽

Decision Tree ◽

Tree Algorithms

Download Full-text

PENERAPAN DATA MINING MENGGUNAKAN ALGORITMA C4.5 TEHADAP PENGARUH PENJUALAN KOPI PADA PT. JPW INDONESIA

Jurnal Sistem Informasi dan Informatika (Simika) ◽

10.47080/simika.v3i1.836 ◽

2020 ◽

Vol 3 (1) ◽

pp. 40-54

Author(s):

Ikong Ifongki

Keyword(s):

Data Mining ◽

Decision Tree ◽

Decision Rules ◽

Large Data ◽

Added Value ◽

Data Set ◽

Use Of Data ◽

Decision Tree Classification ◽

C4.5 Algorithm

Data mining is a series of processes to explore the added value of a data set in the form of knowledge that has not been known manually. The use of data mining techniques is expected to provide knowledge - knowledge that was previously hidden in the data warehouse, so that it becomes valuable information. C4.5 algorithm is a decision tree classification algorithm that is widely used because it has the main advantages of other algorithms. The advantages of the C4.5 algorithm can produce decision trees that are easily interpreted, have an acceptable level of accuracy, are efficient in handling discrete type attributes and can handle discrete and numeric type attributes. The output of the C4.5 algorithm is a decision tree like other classification techniques, a decision tree is a structure that can be used to divide a large data set into smaller sets of records by applying a series of decision rules, with each series of division members of the resulting set become similar to each other. In this case study what is discussed is the effect of coffee sales by processing 106 data from 1087 coffee sales data at PT. JPW Indonesia. Data samples taken will be calculated manually using Microsoft Excel and Rapidminer software. The results of the calculation of the C4.5 algorithm method show that the Quantity and Price attributes greatly affect coffee sales so that sales at PT. JPW Indonesia is still often unstable.

Download Full-text

Partition Real Data in Decision Tree Using Statistical Criterion

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.380-384.1469 ◽

2013 ◽

Vol 380-384 ◽

pp. 1469-1472

Author(s):

Gui Jun Shan

Keyword(s):

Machine Learning ◽

Data Mining ◽

Decision Tree ◽

Classification Accuracy ◽

Real Data ◽

Statistical Criterion ◽

Partition Scheme ◽

C4.5 Decision Tree ◽

Tree Algorithms ◽

Partition Method

Partition methods for real data play an extremely important role in decision tree algorithms in data mining and machine learning because the decision tree algorithms require that the values of attributes are discrete. In this paper, we propose a novel partition method for real data in decision tree using statistical criterion. This method constructs a statistical criterion to find accurate merging intervals. In addition, we present a heuristic partition algorithm to achieve a desired partition result with the aim to improve the performance of decision tree algorithms. Empirical experiments on UCI real data show that the new algorithm generates a better partition scheme that improves the classification accuracy of C4.5 decision tree than existing algorithms.

Download Full-text