scholarly journals Performance evaluation of decision tree classification algorithms using fraud datasets

2020 ◽  
Vol 9 (6) ◽  
pp. 2518-2525
Author(s):  
Eddie Bouy B. Palad ◽  
Mary Jane F. Burden ◽  
Christian Ray Dela Torre ◽  
Rachelle Bea C. Uy

Text mining is one way of extracting knowledge and finding out hidden relationships among data using artificial intelligence methods. Surely, taking advantage of different techniques has been highlighted in previous researches however, the lack of literature focusing on cybercrimes implies the lack of utilization of data mining in facilitating cybercrime investigations in the Philippines. This study therefore classifies computer fraud or online scam data coming from Police incident reports as well as narratives of scam victims as a continuation of a prior study. The dataset consists mainly of unstructured data of 49,822 mainly Filipino words. Further, five (5) decision tree algorithms namely, J48, Hoeffding Tree, Decision Stump, REPTree, and Random Forest were employed and compared in terms of their performance and prediction accuracy. The results show that J48 achieves the highest accuracy and the lowest error rate among other classifiers. Results were validated by Police investigators where J48 was likewise preferred as a potential tool to apply in cybercrime investigations. This indicates the importance of text mining in the field of cybercrime investigation domains in the country. Further work can be carried out in the future using different and more inclusive cybercrime datasets and other classification techniques in Weka or any other data mining tool.

2021 ◽  
pp. 1-10
Author(s):  
Chao Dong ◽  
Yan Guo

The wide application of artificial intelligence technology in various fields has accelerated the pace of people exploring the hidden information behind large amounts of data. People hope to use data mining methods to conduct effective research on higher education management, and decision tree classification algorithm as a data analysis method in data mining technology, high-precision classification accuracy, intuitive decision results, and high generalization ability make it become a more ideal method of higher education management. Aiming at the sensitivity of data processing and decision tree classification to noisy data, this paper proposes corresponding improvements, and proposes a variable precision rough set attribute selection standard based on scale function, which considers both the weighted approximation accuracy and attribute value of the attribute. The number improves the anti-interference ability of noise data, reduces the bias in attribute selection, and improves the classification accuracy. At the same time, the suppression factor threshold, support and confidence are introduced in the tree pre-pruning process, which simplifies the tree structure. The comparative experiments on standard data sets show that the improved algorithm proposed in this paper is better than other decision tree algorithms and can effectively realize the differentiated classification of higher education management.


Author(s):  
Taşkın Dirsehan

Marketing concept has progressed through different phases of evolution in the past. At the moment, customer relationship management is considered as the last era of marketing development. The main purpose of this approach is to build long-term oriented profitable relationships with customers. So, companies should know better their customers. This knowledge can be created through a deeper analysis of companies' data with data mining tools. Companies which are able to use data mining tools will gain strong competitive advantages for their strategic decisions. Hotel industry is selected in this study, since it provides a warehouse of customer comments from which precious knowledge can be obtained if text mining as a data mining tool is used appropriately. Thus, this study attempts to explain the stages of text mining with the use of Rapidminer. As a result, different approaches according to the customer satisfaction/dissatisfaction are discussed to build competitive advantages.


2019 ◽  
Vol 2 (1) ◽  
Author(s):  
Teguh Budi Santoso ◽  
Dela Sekardiana

<p><em>Current credit giving in KOPERIA (Koperasi Warga Komplek Gandaria) is still based on an objective process. Difficulties in determining the feasibility of giving credit are often experienced by cooperative managers, so that problems arise in the cooperative is a default payment of credit installments of customers in KOPERIA. This study aims to form a decision tree classification model to determine the customer's credit worthiness. In this study the application of C4.5 Algorithm, based on the Sets and Attributes used in this study, namely, the amount of income divided into 2 categories&gt; 5 million and 3-5 million, the amount of balance divided into three, namely&gt; 3 million, 1-3 million and &lt;1 Million, The Loan Amount is divided into three, namely 1-4 Months, 5-8 months, and 9-12 Months and Requirements with attributes of Business Capital, buying goods and others. In this study determine the appropriate root nodes, the classification results using C4.5 Algorithm shows that the accuracy of 97.5% is obtained, based on the results obtained shows that the c4.5 algorithm is suitable to be used to determine the feasibility of lending customers to KOPERIA.</em></p><p><strong><em>Keywords</em></strong><em>: Data Mining, C4.5 Algorithm</em><em>, loan feasibility</em></p>


2017 ◽  
Vol 2 (2) ◽  
pp. 220-233
Author(s):  
Luluk Elvitaria

Extracurricular activities are additional activities in schools, where through this activity, students can add or explore the skills of students in self-development efforts. One of the extracurricular activities is foreign language extracurricular activities, covering 5 languages ​​namely Arabic, English, German, Mandarin, Japanese. In knowing students' interest in extracurricular activities, a study was conducted on the level of interest in extracurricular activities, namely foreign languages, students at the Vocational School Health Analyst Abdurrab. In predicting the level of interest in foreign languages ​​by the process of data mining using the C45 Algorithm method. C45 algorithm is a group of Decision Tree Algorithms. From this research, the school can find out the extent of interest in foreign languages ​​in students and schools can increase extracurricular activities and students can develop their interest in foreign languages ​​as they wish.


2017 ◽  
Vol 163 (8) ◽  
pp. 15-19 ◽  
Author(s):  
Bhumika Gupta ◽  
Aditya Rawat ◽  
Akshay Jain ◽  
Arpit Arora ◽  
Naresh Dhami

2020 ◽  
Vol 3 (1) ◽  
pp. 40-54
Author(s):  
Ikong Ifongki

Data mining is a series of processes to explore the added value of a data set in the form of knowledge that has not been known manually. The use of data mining techniques is expected to provide knowledge - knowledge that was previously hidden in the data warehouse, so that it becomes valuable information. C4.5 algorithm is a decision tree classification algorithm that is widely used because it has the main advantages of other algorithms. The advantages of the C4.5 algorithm can produce decision trees that are easily interpreted, have an acceptable level of accuracy, are efficient in handling discrete type attributes and can handle discrete and numeric type attributes. The output of the C4.5 algorithm is a decision tree like other classification techniques, a decision tree is a structure that can be used to divide a large data set into smaller sets of records by applying a series of decision rules, with each series of division members of the resulting set become similar to each other. In this case study what is discussed is the effect of coffee sales by processing 106 data from 1087 coffee sales data at PT. JPW Indonesia. Data samples taken will be calculated manually using Microsoft Excel and Rapidminer software. The results of the calculation of the C4.5 algorithm method show that the Quantity and Price attributes greatly affect coffee sales so that sales at PT. JPW Indonesia is still often unstable.


2013 ◽  
Vol 380-384 ◽  
pp. 1469-1472
Author(s):  
Gui Jun Shan

Partition methods for real data play an extremely important role in decision tree algorithms in data mining and machine learning because the decision tree algorithms require that the values of attributes are discrete. In this paper, we propose a novel partition method for real data in decision tree using statistical criterion. This method constructs a statistical criterion to find accurate merging intervals. In addition, we present a heuristic partition algorithm to achieve a desired partition result with the aim to improve the performance of decision tree algorithms. Empirical experiments on UCI real data show that the new algorithm generates a better partition scheme that improves the classification accuracy of C4.5 decision tree than existing algorithms.


Sign in / Sign up

Export Citation Format

Share Document