Implementasi Algoritma Decision Tree (J.48) untuk Memprediksi Resiko Kredit pada BMT

Atik Febriani; Violita Anggraini

doi:10.31001/tekinfo.v9i2.904

Implementasi Algoritma Decision Tree (J.48) untuk Memprediksi Resiko Kredit pada BMT

Tekinfo Jurnal Ilmiah Teknik Industri dan Informasi ◽

10.31001/tekinfo.v9i2.904 ◽

2021 ◽

Vol 9 (2) ◽

pp. 91-99

Author(s):

Atik Febriani ◽

Violita Anggraini

Keyword(s):

Data Mining ◽

Decision Tree ◽

Information Gain ◽

Root Node

Kredit merupakan hal utama pada lembaga keuangan yang berpengaruh pada pertumbuhan dan perkembangan lembaga tersebut. Lemahnya pengawasan dan manajemen dalam proses pemberian kredit kepada nasabah dapat menyebabkan tingginya kredit macet. Masalah ini terjadi pada salah satu lembaga keuangan yang memberikan kredit pada nasabah yaitu BMT X. Data tahun 2019 menunjukkan terdapat 600 ajuan kredit multiguna. Dari jumlah tersebut, hanya sekitar 76% menunjukkan kolektabilitas yang baik. Kondisi kolektabilitas kredit yang tidak maksimal menyebabkan BMT X harus mengeluarkan biaya lebih untuk mengumpulkan angsuran yang harus dibayarkan oleh debitur secara langsung. Kredit macet ini menimbulkan kerugian pada lembaga keuangan yang bersangkutan. Untuk itu, dalam memberikan kredit, BMT X harus cerdas menilai kelayakan nasabah. Tujuan penelitian ini adalah menyusun rancangan kebijakan BMT X guna meminimasir kesalahan prediksi nasabah dengan kategori kredit macet. Teknik yang digunakan pada penelitian ini yaitu data mining klasifikasi dengan algoritma J.48. Untuk mengukur efektivitas suatu atribut dalam mengklasifikasikan kumpulan sampel data, harus dipilih atribut yang memiliki information gain terbesar yang akan diletakkan pada root node. Penelitian ini menghasilkan enam rule dengan tingkat akurasi sebesar 80,2% sehingga dapat digunakan pihak BMT X untuk menggali informasi kelayakan nasabah untuk mendapatkan kredit. Kata kunci: Algoritma J.48, data mining, pohon keputusan, resiko kredit

Download Full-text

Entropy based C4.5-SHO algorithm with information gain optimization in data mining

PeerJ Computer Science ◽

10.7717/peerj-cs.424 ◽

2021 ◽

Vol 7 ◽

pp. e424

Author(s):

G Sekhar Reddy ◽

Suneetha Chittineni

Keyword(s):

Data Mining ◽

Decision Tree ◽

Information Gain ◽

Characteristic Curve ◽

Cuckoo Search ◽

Computer Assisted ◽

Quadratic Entropy ◽

C4.5 Decision Tree ◽

Data Investigation ◽

Gain Optimization

Information efficiency is gaining more importance in the development as well as application sectors of information technology. Data mining is a computer-assisted process of massive data investigation that extracts meaningful information from the datasets. The mined information is used in decision-making to understand the behavior of each attribute. Therefore, a new classification algorithm is introduced in this paper to improve information management. The classical C4.5 decision tree approach is combined with the Selfish Herd Optimization (SHO) algorithm to tune the gain of given datasets. The optimal weights for the information gain will be updated based on SHO. Further, the dataset is partitioned into two classes based on quadratic entropy calculation and information gain. Decision tree gain optimization is the main aim of our proposed C4.5-SHO method. The robustness of the proposed method is evaluated on various datasets and compared with classifiers, such as ID3 and CART. The accuracy and area under the receiver operating characteristic curve parameters are estimated and compared with existing algorithms like ant colony optimization, particle swarm optimization and cuckoo search.

Download Full-text

PENGGUNAAN DATA MINING DALAM HIT RATE IMPORTASI JALUR MERAH DENGAN MODEL DECISION TREE

JURNAL PERSPEKTIF BEA DAN CUKAI ◽

10.31092/jpbc.v5i2.1297 ◽

2021 ◽

Vol 5 (2) ◽

pp. 187-202

Author(s):

Alfin Yudistira ◽

Muh Nurkhamid

Keyword(s):

Data Mining ◽

Decision Tree ◽

Type A ◽

Tree Model ◽

Root Node ◽

Hit Rate ◽

Use Of Data ◽

New Perspective ◽

Risk Engine ◽

Red Line

ABSTRACT: Customs and Excise faces a big challenge to be able to increase the hit rate of red line imports by 40% in accordance with the Blueprint for the 2014-2025 Ministry of Finance Institutional Transformation Program and international benchmarks. Through a qualitative study, this study aims to determine the use of data mining that is applied to the risk engine based on import data, people's experiences, and research results of customs institutions of other countries. The data mining method used is CRISP-DM, classification method, and decision tree model, using data imported from the red line KPU BC Type A Tanjung Priok for the period September – December 2019 and January 2020. The results show that the use of data mining can increase the hit rate of red line importation. The most relevant attribute in classifying data is the sending country which is categorized as a root node, while the import duty tariff attribute does not provide information on data classification. This research is expected to provide a new perspective for the KPU BC Type A Tanjung Priok in an effort to improve the risk engine targeting and risk engine routing of Customs and Excise. Keywords: CRISP-DM, data mining, decision tree, hit rate, the red line import. ABSTRAK: Bea dan Cukai menghadapi tantangan besar untuk dapat meningkatkan capaian hit rate importasi jalur merah sebesar 40% sesuai dengan Cetak Biru Program Transformasi Kelembagaan Kementerian Keuangan Tahun 2014 – 2025 dan benchmark internasional. Melalui studi kualitatif, penelitian ini bertujuan untuk mengetahui penggunaan data mining yang diterapkan dalam risk engine berdasarkan data importasi, pengalaman orang, dan data hasil penelitian institusi kepabeanan negara lain. Metode data mining yang digunakan adalah CRISP-DM, metode klasifikasi, dan model decision tree, dengan menggunakan data importasi jalur merah Kantor Pelayanan Utama (KPU) Bea dan Cukai (BC) Tipe A Tanjung Priok periode September – Desember 2019 dan Januari 2020. Hasil penelitian menunjukkan bahwa penggunaan data mining dapat meningkatkan capaian hit rate importasi jalur merah. Atribut yang paling relevan dalam mengklasifikasikan data adalah negara pengirim yang dikategorikan sebagai root node (akar), sedangkan atribut tarif bea masuk tidak memberikan informasi dalam klasifikasi data. Penelitian ini diharapkan dapat memberikan pandangan baru bagi KPU BC Tipe A Tanjung Priok dalam upaya perbaikan risk engine targeting dan risk engine penjaluran Bea dan Cukai. Kata Kunci: CRISP-DM, data mining, decision tree, hit rate, importasi jalur merah.

Download Full-text

Predicting Students Performance Based on Their Academic Profile

مجلة جامعة فلسطين التقنية للأبحاث ◽

10.53671/pturj.v8i2.91 ◽

2020 ◽

Vol 8 (2) ◽

pp. 23-39

Author(s):

Hadi Khalilia ◽

Thaer Sammar ◽

Yazeed Sleet

Keyword(s):

Data Mining ◽

Decision Tree ◽

High Speed ◽

Information Gain ◽

Educational Data Mining ◽

Computer Engineering ◽

Use Of Data ◽

Important Field ◽

Index Measure ◽

The University

Data mining is an important field; it has been widely used in different domains. One of the fields that make use of data mining is Educational Data Mining. In this study, we apply machine learning models on data obtained from Palestine Technical University-Kadoorie (PTUK) in Tulkarm for students in the department of computer engineering and applied computing. Students in both fields study the same major courses; C++ and Java. Therefore, we focused on these courses to predict student’s performance. The goal of our study is predicting students’ performance measured by (GPA) in the major. There are many techniques that are used in the educational data mining field. We applied three models on the obtained data which have been commonly used in the educational data mining field; the decision tree with information gain measure, the decision tree with Gini index measure, and the naive Bayes model. We used these models in our work because they are efficient and they have a high speed in data classification, and prediction. The results suggest that the decision tree with information gain measure outperforms other models with 0.66 accuracy. We had a deeper look on key features that we train our models; precisely, their branch of study at school, field of study in the University, and whether or not the students have a scholarship. These features have an influence on the prediction. For example, the accuracy of the decision tree with information gain measure increases to 0.71 when applied on the subset of students who studied in the scientific branch at high school. This study is important for both the students and the higher management of PTUK. The university will be able to do some predictions on the performance of the students. In the carried experiments, the prediction of the model was inline with the actual expectation.

Download Full-text

An Improved SPRINT Algorithm

Advanced Materials Research ◽

10.4028/www.scientific.net/amr.532-533.1685 ◽

2012 ◽

Vol 532-533 ◽

pp. 1685-1690 ◽

Cited By ~ 1

Author(s):

Zhi Kang Luo ◽

Huai Ying Sun ◽

De Wang

Keyword(s):

Machine Learning ◽

Data Mining ◽

Decision Tree ◽

Learning Communities ◽

Information Gain ◽

Decision Tree Algorithm ◽

Tree Algorithm ◽

Gain Ratio ◽

Information Gain Ratio ◽

Improved Algorithm

This paper presents an improved SPRINT algorithm. The original SPRINT algorithm is a scalable and parallelizable decision tree algorithm, which is a popular algorithm in data mining and machine learning communities. To improve the algorithm's efficiency, we propose an improved algorithm. Firstly, we select the splitting attributes and obtain the best splitting attribute from them by computing the information gain ratio of each attribute. After that, we calculate the best splitting point of the best splitting attribute. Since it avoids a lot of calculations of other attributes, the improved algorithm can effectively reduce the computation.

Download Full-text

Mate-tree: un algoritmo para la tarea de clasificación basado en operadores algebraicos y primitivas SQL [Mate-tree: an Algorithm for classification task based on algebraic operators and SQL primitives]

Ventana informatica ◽

10.30554/ventanainform.26.140.2012 ◽

2012 ◽

Author(s):

Ricardo Timarán Pereira

Keyword(s):

Data Mining ◽

Decision Tree ◽

Decision Trees ◽

Information Gain ◽

Classification Task ◽

Classification Algorithms ◽

Classification Rules ◽

Decision Tree Classification ◽

Expensive Process ◽

Algebraic Operators

Resumen La clasificación basada en árboles de decisión es el modelo más utilizado y popular por su simplicidad y facilidad para su entendimiento. El cálculo del valor de la métrica que permite seleccionar, en cada nodo, el atributo que tenga una mayor potencia para clasificar sobre el conjunto de valores del atributo clase, es el proceso más costoso del algoritmo utilizado. Para calcular esta métrica, no se necesitan los datos, sino las estadísticas acerca del número de registros en los cuales se combinan los atributos condición con el atributo clase. Entre los algoritmos de clasificación por árboles de decisión se cuentan ID-3, C4.5, SPRINT y SLIQ. Sin embargo, ninguno de estos algoritmos se basan en operadores algebraicos relacionales y se implementa con primitivas SQL. En este artículo se presenta Mate-tree, un algoritmo para la tarea de minería de datos clasificación basado en los operadores algebraicos relacionales Mate, Entro, Gain y Describe Classifier, implementados en la cláusula SQL Select con las primitivas SQL Mate by, Entro(), Gain() y Describe Classification Rules, los cuales facilitan el cálculo de Ganancia de Información, la construcción del árbol de decisión y el acoplamiento fuerte de este algoritmo con un SGBD. Palabras ClavesÁrboles de Decisión, Minería de Datos, Operadores Algebraicos Relacionales, Primitivas SQL, Tarea de Clasificación. Abstract Decision tree classification is the most used and popular model, because it is simple and easy to understand. The calculation of the value of the measure that allows selecting, in each node, the attribute with the highest power to classify on the set of values of the class attribute, is the most expensive process in the used algorithm. To compute this measure, the data are not needed, but the statistics about the number of records in which combine the test attributes with the class attribute. Among the classification algorithms by decision trees are ID-3, C4.5, SPRINT and SLIQ. However, none of these algorithms are based on relational algebraic operators and are implemented with SQL primitives. In this paper Mate-tree, an algorithm for the classification data mining task based on the relational algebraic operators Mate, Entro, Gain and Describe Classifier, is presented. They were implemented in the SQL Select clause with SQL primitives Mate by, Entro(), Gain() y Describe Classification Rules. They facilitate the calculation of the Information Gain, the construction of the decision tree and the tight coupled of this algorithm with a DBMS.KeywordsDecision Trees, Data Mining, Relational Algebraic Operators, SQL Primitives, Classification Task.

Download Full-text

Study of The ID3 and C4.5 Learning Algorithms

Journal of Medical Informatics and Decision Making ◽

10.14302/issn.2641-5526.jmid-20-3302 ◽

2020 ◽

Vol 1 (2) ◽

pp. 29-43

Author(s):

Y. Fakir ◽

M. Azalmad ◽

R. Elaychi

Keyword(s):

Data Mining ◽

Decision Making ◽

Data Analysis ◽

Decision Tree ◽

Decision Trees ◽

Information Gain ◽

Learning Algorithms ◽

Large Data ◽

Classification Algorithms ◽

Important Data

Data Mining is a process of exploring against large data to find patterns in decision-making. One of the techniques in decision-making is classification. Data classification is a form of data analysis used to extract models describing important data classes. There are many classification algorithms. Each classifier encompasses some algorithms in order to classify object into predefined classes. Decision Tree is one such important technique, which builds a tree structure by incrementally breaking down the datasets in smaller subsets. Decision Trees can be implemented by using popular algorithms such as ID3, C4.5 and CART etc. The present study considers ID3 and C4.5 algorithms to build a decision tree by using the “entropy” and “information gain” measures that are the basics components behind the construction of a classifier model

Download Full-text

PREDIKSI CALON MAHASISWA BARU MENGUNAKAN METODE KLASIFIKASI DECISION TREE

CSRID (Computer Science Research and Its Development Journal) ◽

10.22303/csrid.7.1.2015.48-56 ◽

2015 ◽

Vol 7 (1) ◽

pp. 48

Author(s):

Mambang Mambang ◽

Finki Dona Marleny

Keyword(s):

Decision Tree ◽

Information Gain ◽

Internal Node ◽

Terminal Node ◽

Gain Ratio ◽

Root Node ◽

Information Gain Ratio

<p>Sebelum penyelengaraan pendidikan tenaga kesehatan memulai tahun ajaran baru, maka langkah awal akan dilaksanakan seleksi penerimaan mahasiswa baru yang berasal dari lulusan pendidikan menengah umum maupun kejuruan yang sederajat. Seleksi penerimaan mahasiswa baru ini bertujuan untuk menyaring calon mahasiswa dari berbagai latar belakang yang di sesuaikan dengan standar yang telah di tentukan oleh lembaga. Dalam penelitian ini bagaimana akurasi algoritma C4.5 untuk memprediksi kelulusan calon mahasiswa baru. Model decision tree merupakan metode prediksi klasifikasi untuk membuat sebuah tree yang terdiri dari root node, internal node dan terminal node. Berdasarkan hasil eksperimen dan evaluasi yang dilakukan maka dapat disimpulkan bahwa Algoritma C4.5 dengan Uncertainty didapatkan Akurasi 80,39%, Precision 94,44%, Recall 75,00% sedangkan dengan Algoritma C4.5 dengan Information Gain Ratio Akurasi 88,24%, Precision 98,28%, Recall 83,82%. </p>

Download Full-text

The Application of Data Mining Technology in the Teaching Evaluation in Colleges and Universities

Journal of Computational and Theoretical Nanoscience ◽

10.1166/jctn.2017.6115 ◽

2017 ◽

Vol 14 (1) ◽

pp. 7-12 ◽

Cited By ~ 2

Author(s):

Xiaoqi Liu

Keyword(s):

Data Mining ◽

Decision Tree ◽

Evaluation System ◽

Information Gain ◽

Original Data ◽

Teaching Evaluation ◽

Personal Factors ◽

Mining Technology ◽

Id3 Algorithm ◽

College Work

As the teaching management informationization level is higher and higher, Network based teaching evaluation system has been widely used, and a lot of evaluation of the original data has been accumulated. This research, taking recent five years teaching evaluation data of the college work for as basis, analyzes teachers’ personal factors and teaching operation factors respectively with the data mining technology of decision tree ID3 algorithm. By calculating the factors of information entropy and information gain value, the corresponding decision tree is gained. The teaching evaluation results are made use of really rather than become a mere formality, and thus provide powerful basis for the effectiveness and scientificalness of teaching evaluation.

Download Full-text

Analyzing Student Performance in Programming Education Using Classification Techniques

International Journal of Emerging Technologies in Learning (iJET) ◽

10.3991/ijet.v15i02.11527 ◽

2020 ◽

Vol 15 (02) ◽

pp. 127 ◽

Cited By ~ 2

Author(s):

Kissinger Sunday ◽

Patrick Ocheja ◽

Sadiq Hussain ◽

Solomon Sunday Oyelere ◽

Balogun Oluwafemi Samson ◽

...

Keyword(s):

Data Mining ◽

Decision Tree ◽

Computer Science ◽

Student Performance ◽

Test Score ◽

Information Gain ◽

Search Method ◽

Experimental Results ◽

Id3 Algorithm ◽

Tree Algorithms

In this research, we aggregated students log data such as Class Test Score (CTS), Assignment Completed (ASC), Class Lab Work (CLW) and Class Attendance (CATT) from the Department of Mathematics, Computer Science Unit, Usmanu Danfodiyo University, Sokoto, Nigeria. Similarly, we employed data mining techniques such as ID3 & J48 Decision Tree Algorithms to analyze these data. We compared these algorithms on 239 classification instances. The experimental results show that the J48 algorithm has higher accuracy in the classification task compared to the ID3 algorithm. The important feature attributes such as Information Gain and Gain Ratio feature evaluators were also compared. Both the methods applied were able to rank search method and the experimental results confirmed that the two methods derived the same set of attributes with a slight deviation in the ranking. From the results analyzed, we discovered that 67.36 percent failed the course titled Introduction to Computer Programming, while 32.64 percent passed the course. Since the CATT has the highest gain value from our analysis; we concluded that it is largely responsible for the success or failure of the students.

Download Full-text

KLASIFIKASI SMS SPAM MENGGUNAKAN SUPPORT VECTOR MACHINE

Jurnal Pilar Nusa Mandiri ◽

10.33480/pilar.v15i2.693 ◽

2019 ◽

Vol 15 (2) ◽

pp. 275-280

Author(s):

Agus Setiyono ◽

Hilman F Pardede

Keyword(s):

Data Mining ◽

Support Vector Machine ◽

Decision Tree ◽

Naive Bayes ◽

Naïve Bayes ◽

Support Vector ◽

Spam Detection ◽

Support Vector Machine Algorithm ◽

Data Mining Techniques ◽

To Receive

It is now common for a cellphone to receive spam messages. Great number of received messages making it difficult for human to classify those messages to Spam or no Spam. One way to overcome this problem is to use Data Mining for automatic classifications. In this paper, we investigate various data mining techniques, named Support Vector Machine, Multinomial Naïve Bayes and Decision Tree for automatic spam detection. Our experimental results show that Support Vector Machine algorithm is the best algorithm over three evaluated algorithms. Support Vector Machine achieves 98.33%, while Multinomial Naïve Bayes achieves 98.13% and Decision Tree is at 97.10 % accuracy.

Download Full-text