Identification of Poison using C4.5 Algorithm

International Journal of Scientific Research in Science Engineering and Technology ◽

10.32628/ijsrset207247 ◽

2020 ◽

pp. 218-222

Author(s):

Lai Lai Yee ◽

Myo Ma Ma

Keyword(s):

Data Mining ◽

Test Data ◽

Knowledge Worker ◽

Training Data ◽

Independent Data ◽

Classification Rules ◽

Natural Evolution ◽

C4.5 Algorithm ◽

Other Information

Data mining is the task of discovering interesting patterns from large amounts of data where the data can be stored in databases, data warehouses or other information repositories. This can be viewed as a result of the natural evolution of information technology. The key point is that data mining is the application of these and other AI and statistical techniques to common business problems in a fashion that makes these techniques available to the skilled knowledge worker as well as the trained statistics professional. This paper is classification system for Toxicology using C4.5. Firstly, the input data are randomly partitioned into two independent data, a training data and a test data. And then two third of the data are allocated to the training data and the remaining one third is allocated to the test data. Final step is C4.5 Algorithm Process, the training data is used to derive C4.5 algorithm. Classification Process, test data are used to estimate the accuracy of the classification rules. If the accuracy is considered acceptable the rules can be applied to the classification of new data.

Download Full-text

DATA MINING ALGORITHM C4.5 CLASSIFICATION DETERMINATION CREDIT ELIGIBILITY FOR JAYA BERSAMA COOPERATIVES (KORJABE)

JURTEKSI ◽

10.33330/jurteksi.v8i1.1298 ◽

2021 ◽

Vol 8 (1) ◽

pp. 59-68

Author(s):

Christnatalis Christnatalis ◽

Roni Rayandi Saragih ◽

Bobby Christianto Tambunan

Keyword(s):

Data Mining ◽

Test Data ◽

Selection Method ◽

Training Data ◽

Classification Error ◽

Data Mining Algorithm ◽

Mining Method ◽

Data Mining Method ◽

Mining Algorithm ◽

C4.5 Algorithm

Abstract: This study uses the C4.5 classification algorithm to determine creditworthness, clasification aims to divide the assigned object intoin a number of categories called classes. In this study, the authorusing data mining and C4.5 algorithm as the selection method. The criteria used are loan installments, prospective customer income, termloan time, status of prospective customers. This study resulted in a classification modeldecision tree using the C4.5 algorithm is included in the Excellent category Classification with an accuracy value of 98.33% and a classification error of 1.67%,so that this study uses 70% training data and 30% test data. From resultthe calculation obtained shows that the C4.5 algorithm can be usedto determine the feasibility of granting credit to Koperasi Jaya customers Together (KORJABE). Keywords: Analysis, Credit Eligibility, C4 Algorithm, Data Mining, Method Abstrak: Penelitian ini menggunakan metode Algoritma C4.5 klasifikasi untuk menentukan kelayakan kredit, klasifikasi bertujuan untuk membagi objek yang ditetapkan ke dalam satu nomor kategori yang disebut kelas. Dalam penelitian ini, penulis menggunankan data mining dan algoritma C4.5 sebagai metode pemilihannya. Kriteria yang digunakan yaitu , angsuran pinjaman,penghasilan calon nasabah,jangka waktu pinjaman ,status calon nasabah. Penelitian ini menghasillkan model klasifikasi pohon keputusan menggunakan algoritma C4.5 termasuk dalam kategori Excellent Classification dengan nilai akurasi sebesar 98,33% dan klasifikasi eror 1,67%, sehingga penelitian ini kan menggunakan data latih 70% dan data uji 30%. Dari hasil perhitungan yang diperoleh menunjukan bahwa algoritma C4.5 dapat digunakan untuk menen tukan kelayakan pemberian kredit kepada nasabah Koperasi Jaya Bersama (KORJABE). Kata kunci: Algoritma C4.5, Analisis, Data Mining, Kelayakan Kredit, Metode

Download Full-text

Application of the C4.5 Algorithm to Predict the Types of Disease in Pigs Based on Android

JELIKU (Jurnal Elektronik Ilmu Komputer Udayana) ◽

10.24843/jlk.2021.v10.i01.p14 ◽

2021 ◽

Vol 10 (1) ◽

pp. 105

Author(s):

I Gusti Ayu Purnami Indryaswari ◽

Ida Bagus Made Mahendra

Keyword(s):

Programming Language ◽

Test Data ◽

Training Data ◽

Data Sets ◽

Android Application ◽

C4.5 Algorithm ◽

Sqlite Database

Many Indonesian people, especially in Bali, make pigs as livestock. Pig livestock are susceptible to various types of diseases and there have been many cases of pig deaths due to diseases that cause losses to breeders. Therefore, the author wants to create an Android-based application that can predict the type of disease in pigs by applying the C4.5 Algorithm. The C4.5 algorithm is an algorithm for classifying data in order to obtain a rule that is used to predict something. In this study, 50 training data sets were used with 8 types of diseases in pigs and 31 symptoms of disease. which is then inputted into the system so that the data is processed so that the system in the form of an Android application can predict the type of disease in pigs. In the testing process, it was carried out by testing 15 test data sets and producing an accuracy value that is 86.7%. In testing the application features built using the Kotlin programming language and the SQLite database, it has been running as expected.

Download Full-text

Sistem Pengelompokan Siswa Berdasarkan Tingkat Kedisiplinan Menggunakan Metode Naïve Bayes Classifier

Jurnal Teknologi Informasi dan Komunikasi (TIKomSiN) ◽

10.30646/tikomsin.v9i2.575 ◽

2021 ◽

Vol 9 (2) ◽

pp. 50

Author(s):

Budi Hartanto ◽

Sri Tomo

Keyword(s):

Test Data ◽

Educational Process ◽

Training Data ◽

Student Discipline ◽

Test Results ◽

Bayes Classifier ◽

Bayes Method ◽

Important Thing ◽

Class Test

Discipline is a very important thing in the educational process. Discipline will succeed if it is applied to students correctly. Student discipline is that every student follows every rule and order that has been set by the school. At SMK Muhammadiyah 2 Sukoharjo student discipline. Declining discipline at SMK Muhammadiah 2 Sukoharjo is marked by the increase in points of violation from students. The purpose of this study was to apply the nave Bayes method in the classification of student discipline levels at SMK Muhammadiyah 2 Sukoharjo. With this information will be obtained that can be used for information on which students need to be given Counseling Guidance to provide direction and guidance to students. The attributes used are cases of fights, not attending apples, not carrying out picket, not entering without explanation, arriving late, noisy in class. Test results with 490 records with a portion of 75% training data and 25% test data. And produces an accuracy of 76%.

Download Full-text

Study of Potential Classification of Lost Students in College Based on Information Extraction on Text-Based Social Media; Case Study of Panca Budi Pembangunan University

International Journal of Research and Review ◽

10.52403/ijrr.20211140 ◽

2021 ◽

Vol 8 (11) ◽

pp. 325-331

Author(s):

Eko Hariyanto ◽

Sri Wahyuni ◽

Supina Batubara

Keyword(s):

Data Mining ◽

Social Media ◽

Text Mining ◽

Information Extraction ◽

Preventive Measure ◽

Drop Out ◽

Training Data ◽

Computational Algorithms ◽

Technological Readiness

The main problem studied in this study is the large number of lost students who harm universities because of the difficulty of monitoring or monitoring as a preventive measure. Therefore, this research becomes very important to be done so that college institutions can make efforts to detect early (classification) of students who potentially cannot complete their studies on time or students who will drop out (DO). Thus, PT institutions through related parties such as academic guidance lecturers, academic bureaus and others can do initial prevention by providing the best solution or solution to the problems faced by students. This research aims to determine the training data model consisting of academic and non-academic factors (including the results of extracting information from social media). Furthermore, this model is used as a basis for classifying students who have the potential to "graduate on time", "graduate not on time", and "DO". The method approach used is quantitative with text mining computational algorithms for the process of extracting knowledge / information from social media which is further used in data training, as well as data mining computational algorithms for the process of classification of potential completion of student studies. The mandatory external targeted in the first year is the publication of the international journal Scopus Q4 and in the second year is the publication of the international journal Scopus Q3. For additional external targets in the first and second years respectively are the publication of international journals indexed on reputable indexers, ISBN teaching books and copyrights. The level of technological readiness (TKT) in this study up to level 2 is the formulation of technological concepts and applications to classify the potential completion of student studies using data mining. Keywords: [student lost, knowledge/information extraction, data classification, text mining, data mining].

Download Full-text

Bio-Inspired Algorithms for Medical Data Analysis

Handbook of Research on Biomimicry in Information Retrieval and Knowledge Management - Advances in Web Technologies and Engineering ◽

10.4018/978-1-5225-3004-6.ch014 ◽

2018 ◽

pp. 251-275 ◽

Cited By ~ 1

Author(s):

Hanane Menad ◽

Abdelmalek Amine

Keyword(s):

Data Mining ◽

Data Analysis ◽

Social Behavior ◽

Medical Data ◽

The Other ◽

Data Sets ◽

Classification Rules ◽

Medical Data Mining ◽

Good Efficiency

Medical data mining has great potential for exploring the hidden patterns in the data sets of the medical domain. These patterns can be utilized for clinical diagnosis. Bio-inspired algorithms is a new field of research. Its main advantage is knitting together subfields related to the topics of connectionism, social behavior, and emergence. Briefly put, it is the use of computers to model living phenomena and simultaneously the study of life to improve the usage of computers. In this chapter, the authors present an application of four bio-inspired algorithms and meta heuristics for classification of seven different real medical data sets. Two of these algorithms are based on similarity calculation between training and test data while the other two are based on random generation of population to construct classification rules. The results showed a very good efficiency of bio-inspired algorithms for supervised classification of medical data.

Download Full-text

KLASIFIKASI DOKUMEN TUGAS AKHIR (SKRIPSI) MENGGUNAKAN K-NEAREST NEIGHBOR

JISKA (Jurnal Informatika Sunan Kalijaga) ◽

10.14421/jiska.2019.41-07 ◽

2019 ◽

Vol 4 (1) ◽

pp. 69

Author(s):

Kitami Akromunnisa ◽

Rahmat Hidayat

Keyword(s):

Test Data ◽

Cross Validation ◽

Nearest Neighbor ◽

Data Distribution ◽

Training Data ◽

K Nearest Neighbor ◽

Electronic Documents ◽

Digital Version ◽

Abstract Data

Various scientific works from academicians such as theses, research reports, practical work reports and so forth are available in the digital version. However, in general this phenomenon is not accompanied by a growth in the amount of information or knowledge that can be extracted from these electronic documents. This study aims to classify the abstract data of informatics engineering thesis. The algorithm used in this study is K-Nearest Neighbor. Amount of data used 50 abstract data of Indonesian language, 454 data of English abstract and 504 title data. Each data is divided into training data and test data. Test data will be classified automatically with the classifier model that has been made. Based on the research conducted, the classification of the Indonesian essential data resulted in greater accuracy without going through a stemming process that had a 9: 1 ratio of 100.0% compared to an 8: 2 ratio of 90.0%, 7: 3 which was 80.0%, 6: 4 which is 60.0% and the data distribution using Kfold cross validation is 80.0%.

Download Full-text

Penerapan Algoritma C5.0 Untuk Prediksi Kelulusan Pembelajaran Mahasiswa Pada Matakuliah Arsitektur Sistem Komputer

JURNAL MEDIA INFORMATIKA BUDIDARMA ◽

10.30865/mib.v5i3.3116 ◽

2021 ◽

Vol 5 (3) ◽

pp. 1166

Author(s):

Muchamad Sobri Sungkar ◽

M Taufik Qurohman

Keyword(s):

Data Mining ◽

Decision Tree ◽

Extraction Process ◽

Study Program ◽

Data Set ◽

C4.5 Algorithm ◽

Previous Algorithm ◽

Process Prediction ◽

Computer System Architecture

Computer system architecture is one of the subjects that must be taken in the informatics engineering study program. In the study program the graduation of each student in the course is one of the important aspects that must be evaluated every semester. Graduation for each student / I in the course is an illustration that the learning process delivered is going well and also the material presented by the lecturer in charge of the course can be digested by students. Graduation of each student in the course can be predicted based on the habit pattern of the students. Data mining is an alternative process that can be done to find out habit patterns based on the data that has been collected. Data mining itself is an extraction process on a collection of data that produces valuable information for companies, agencies or organizations that can be used in the decision-making process. Prediction of graduation with data mining can be solved by classifying the data set. The C5.0 algorithm is an improvement algorithm from the C4.5 algorithm where the process is almost the same, only the C5.0 algorithm has advantages over the previous algorithm. The results of the C5.0 algorithm are in the form of a decision tree or a rule that is formed based on the entropy or gain value. The prediction process is carried out based on the classification of the C5.0 algorithm by using the attributes of Attendance Value, Assignment Value, UTS Value and UAS Value. The final result of the C5.0 algorithm classification process is a decision tree with rules in it. The performance of the C5.0 algorithm gets a high accuracy rate of 93.33%

Download Full-text

Feature Selection in Classification of Blood Sugar Disease Using Particle Swarm Optimization (PSO) on C4.5 Algorithm

Jurnal RESTI (Rekayasa Sistem dan Teknologi Informasi) ◽

10.29207/resti.v4i3.1881 ◽

2020 ◽

Vol 4 (3) ◽

pp. 569-575

Author(s):

Dwi Meylitasari Tarigan ◽

Dian Palupi Rini ◽

Samsuryadi

Keyword(s):

Data Mining ◽

Particle Swarm Optimization ◽

Blood Sugar ◽

Blood Sugar Level ◽

Public Awareness ◽

Particle Swarm ◽

Data Mining Technique ◽

Swarm Optimization ◽

C4.5 Algorithm

Diabetes Mellitus (DM) is a disease caused by blood sugar level increased were higher than the maximum limit. Food consumed tends to contain uncontrolled sugar which could cause the drastic increase of blood sugar level. It is necessary to efforts, to increasing the public awareness to controlling blood sugar and the risks of increasing blood sugar level so as to determine of preventive and early detection measures One of used of data mining technique is information technology in the health sector which used a lot as a decision maker to predicting and diagnosing a several disease. This research aims to optimizing the features on classification of the data mining with the C4.5 algorithm using Particle Swarm Optimization (PSO) to detect the blood sugar level in patient. The dataset used is the effect of physical activity to the Blood Sugar Level at H. Abdul Manan Simatupang Kisaran Regional Public Hospital. The amount of dataset used is 42 record with 10 attributes. The result of this research obtained that the Particle Swarm Optimization (PSO) may increasing the accuracy performance of C4.5 from 86% to 95%. Whereas the evaluation result of the AUC Value increasing from 0,917 to 0,950. From those 10 attributes which are then selection with using PSO into 7 attributes used to determine the prediction of sugar level. Therefore the Algorithm C4.5 using the Particle Swarm Optimization (PSO) may provide the best solution to the accuracy of detection blood sugar levels.

Download Full-text

A Fast Boosting Based Incremental Genetic Algorithm for Mining Classification Rules in Large Datasets

International Journal of Applied Evolutionary Computation ◽

10.4018/jaec.2011010104 ◽

2011 ◽

Vol 2 (1) ◽

pp. 49-58

Author(s):

Periasamy Vivekanandan ◽

Raju Nedunchezhian

Keyword(s):

Genetic Algorithm ◽

Training Data ◽

Classification Rule ◽

Rule Discovery ◽

Classification Rules ◽

Search Technique ◽

Natural Evolution ◽

Ensemble Of Classifiers ◽

Data Set ◽

Mining Community

Genetic algorithm is a search technique purely based on natural evolution process. It is widely used by the data mining community for classification rule discovery in complex domains. During the learning process it makes several passes over the data set for determining the accuracy of the potential rules. Due to this characteristic it becomes an extremely I/O intensive slow process. It is particularly difficult to apply GA when the training data set becomes too large and not fully available. An incremental Genetic algorithm based on boosting phenomenon is proposed in this paper which constructs a weak ensemble of classifiers in a fast incremental manner and thus tries to reduce the learning cost considerably.

Download Full-text

Use of Data Mining for Prediction of Customer Loyalty

CommIT (Communication and Information Technology) Journal ◽

10.21512/commit.v10i1.1660 ◽

2015 ◽

Vol 10 (1) ◽

pp. 41 ◽

Cited By ~ 3

Author(s):

Andri Wijaya ◽

Abba Suganda Girsang

Keyword(s):

Data Mining ◽

Customer Loyalty ◽

Classification Accuracy ◽

Nearest Neighbor ◽

Naive Bayes ◽

Naïve Bayes ◽

Training Data ◽

Training Set ◽

Use Of Data ◽

C4.5 Algorithm

This article discusses the analysis of customer loyalty using three data mining methods: C4.5,Naive Bayes, and Nearest Neighbor Algorithms and real-world empirical data. The data contain ten attributes related to the customer loyalty and are obtained from a national multimedia company in Indonesia. The dataset contains 2269 records. The study also evaluates the effects of the size of the training data to the accuracy of the classification. The results suggest that C4.5 algorithm produces highest classification accuracy at the order of 81% followed by the methods of Naive Bayes 76% and Nearest Neighbor 55%. In addition, the numerical evaluation also suggests that the proportion of 80% is optimal for the training set.

Download Full-text