scholarly journals Identification of Poison using C4.5 Algorithm

Author(s):  
Lai Lai Yee ◽  
Myo Ma Ma

Data mining is the task of discovering interesting patterns from large amounts of data where the data can be stored in databases, data warehouses or other information repositories. This can be viewed as a result of the natural evolution of information technology. The key point is that data mining is the application of these and other AI and statistical techniques to common business problems in a fashion that makes these techniques available to the skilled knowledge worker as well as the trained statistics professional. This paper is classification system for Toxicology using C4.5. Firstly, the input data are randomly partitioned into two independent data, a training data and a test data. And then two third of the data are allocated to the training data and the remaining one third is allocated to the test data. Final step is C4.5 Algorithm Process, the training data is used to derive C4.5 algorithm. Classification Process, test data are used to estimate the accuracy of the classification rules. If the accuracy is considered acceptable the rules can be applied to the classification of new data.

JURTEKSI ◽  
2021 ◽  
Vol 8 (1) ◽  
pp. 59-68
Author(s):  
Christnatalis Christnatalis ◽  
Roni Rayandi Saragih ◽  
Bobby Christianto Tambunan

Abstract: This study uses the C4.5 classification algorithm to determine creditworthness, clasification aims to divide the assigned object intoin a number of categories called classes. In this study, the authorusing data mining and C4.5 algorithm as the selection method. The criteria used are loan installments, prospective customer income, termloan time, status of prospective customers. This study resulted in a classification modeldecision tree using the C4.5 algorithm is included in the Excellent category Classification with an accuracy value of 98.33% and a classification error of 1.67%,so that this study uses 70% training data and 30% test data. From resultthe calculation obtained shows that the C4.5 algorithm can be usedto determine the feasibility of granting credit to Koperasi Jaya customers Together (KORJABE).            Keywords: Analysis, Credit Eligibility, C4 Algorithm, Data Mining, Method  Abstrak: Penelitian ini menggunakan metode Algoritma C4.5 klasifikasi untuk menentukan kelayakan kredit, klasifikasi bertujuan untuk membagi objek yang ditetapkan ke dalam satu  nomor kategori yang disebut kelas. Dalam penelitian ini, penulis menggunankan data mining dan algoritma C4.5 sebagai metode pemilihannya. Kriteria yang digunakan yaitu , angsuran  pinjaman,penghasilan calon nasabah,jangka waktu pinjaman ,status calon nasabah. Penelitian ini menghasillkan model klasifikasi pohon keputusan menggunakan algoritma C4.5 termasuk dalam kategori Excellent Classification dengan nilai akurasi sebesar 98,33% dan klasifikasi eror 1,67%, sehingga penelitian ini kan menggunakan data latih 70% dan data uji 30%. Dari hasil perhitungan yang diperoleh menunjukan bahwa algoritma C4.5 dapat digunakan untuk menen tukan kelayakan pemberian kredit kepada nasabah Koperasi Jaya Bersama (KORJABE). Kata kunci: Algoritma C4.5, Analisis,  Data Mining, Kelayakan Kredit, Metode


2021 ◽  
Vol 10 (1) ◽  
pp. 105
Author(s):  
I Gusti Ayu Purnami Indryaswari ◽  
Ida Bagus Made Mahendra

Many Indonesian people, especially in Bali, make pigs as livestock. Pig livestock are susceptible to various types of diseases and there have been many cases of pig deaths due to diseases that cause losses to breeders. Therefore, the author wants to create an Android-based application that can predict the type of disease in pigs by applying the C4.5 Algorithm. The C4.5 algorithm is an algorithm for classifying data in order to obtain a rule that is used to predict something. In this study, 50 training data sets were used with 8 types of diseases in pigs and 31 symptoms of disease. which is then inputted into the system so that the data is processed so that the system in the form of an Android application can predict the type of disease in pigs. In the testing process, it was carried out by testing 15 test data sets and producing an accuracy value that is 86.7%. In testing the application features built using the Kotlin programming language and the SQLite database, it has been running as expected.


2021 ◽  
Vol 9 (2) ◽  
pp. 50
Author(s):  
Budi Hartanto ◽  
Sri Tomo

Discipline is a very important thing in the educational process. Discipline will succeed if it is applied to students correctly. Student discipline is that every student follows every rule and order that has been set by the school. At SMK Muhammadiyah 2 Sukoharjo student discipline. Declining discipline at SMK Muhammadiah 2 Sukoharjo is marked by the increase in points of violation from students. The purpose of this study was to apply the nave Bayes method in the classification of student discipline levels at SMK Muhammadiyah 2 Sukoharjo. With this information will be obtained that can be used for information on which students need to be given Counseling Guidance to provide direction and guidance to students. The attributes used are cases of fights, not attending apples, not carrying out picket, not entering without explanation, arriving late, noisy in class. Test results with 490 records with a portion of 75% training data and 25% test data. And produces an accuracy of 76%.


2021 ◽  
Vol 8 (11) ◽  
pp. 325-331
Author(s):  
Eko Hariyanto ◽  
Sri Wahyuni ◽  
Supina Batubara

The main problem studied in this study is the large number of lost students who harm universities because of the difficulty of monitoring or monitoring as a preventive measure. Therefore, this research becomes very important to be done so that college institutions can make efforts to detect early (classification) of students who potentially cannot complete their studies on time or students who will drop out (DO). Thus, PT institutions through related parties such as academic guidance lecturers, academic bureaus and others can do initial prevention by providing the best solution or solution to the problems faced by students. This research aims to determine the training data model consisting of academic and non-academic factors (including the results of extracting information from social media). Furthermore, this model is used as a basis for classifying students who have the potential to "graduate on time", "graduate not on time", and "DO". The method approach used is quantitative with text mining computational algorithms for the process of extracting knowledge / information from social media which is further used in data training, as well as data mining computational algorithms for the process of classification of potential completion of student studies. The mandatory external targeted in the first year is the publication of the international journal Scopus Q4 and in the second year is the publication of the international journal Scopus Q3. For additional external targets in the first and second years respectively are the publication of international journals indexed on reputable indexers, ISBN teaching books and copyrights. The level of technological readiness (TKT) in this study up to level 2 is the formulation of technological concepts and applications to classify the potential completion of student studies using data mining. Keywords: [student lost, knowledge/information extraction, data classification, text mining, data mining].


Author(s):  
Hanane Menad ◽  
Abdelmalek Amine

Medical data mining has great potential for exploring the hidden patterns in the data sets of the medical domain. These patterns can be utilized for clinical diagnosis. Bio-inspired algorithms is a new field of research. Its main advantage is knitting together subfields related to the topics of connectionism, social behavior, and emergence. Briefly put, it is the use of computers to model living phenomena and simultaneously the study of life to improve the usage of computers. In this chapter, the authors present an application of four bio-inspired algorithms and meta heuristics for classification of seven different real medical data sets. Two of these algorithms are based on similarity calculation between training and test data while the other two are based on random generation of population to construct classification rules. The results showed a very good efficiency of bio-inspired algorithms for supervised classification of medical data.


2019 ◽  
Vol 4 (1) ◽  
pp. 69
Author(s):  
Kitami Akromunnisa ◽  
Rahmat Hidayat

Various scientific works from academicians such as theses, research reports, practical work reports and so forth are available in the digital version. However, in general this phenomenon is not accompanied by a growth in the amount of information or knowledge that can be extracted from these electronic documents. This study aims to classify the abstract data of informatics engineering thesis. The algorithm used in this study is K-Nearest Neighbor. Amount of data used 50 abstract data of Indonesian language, 454 data of English abstract and 504 title data. Each data is divided into training data and test data. Test data will be classified automatically with the classifier model that has been made. Based on the research conducted, the classification of the Indonesian essential data resulted in greater accuracy without going through a stemming process that had a 9: 1 ratio of 100.0% compared to an 8: 2 ratio of 90.0%, 7: 3 which was 80.0%, 6: 4 which is 60.0% and the data distribution using Kfold cross validation is 80.0%.


2021 ◽  
Vol 5 (3) ◽  
pp. 1166
Author(s):  
Muchamad Sobri Sungkar ◽  
M Taufik Qurohman

Computer system architecture is one of the subjects that must be taken in the informatics engineering study program. In the study program the graduation of each student in the course is one of the important aspects that must be evaluated every semester. Graduation for each student / I in the course is an illustration that the learning process delivered is going well and also the material presented by the lecturer in charge of the course can be digested by students. Graduation of each student in the course can be predicted based on the habit pattern of the students. Data mining is an alternative process that can be done to find out habit patterns based on the data that has been collected. Data mining itself is an extraction process on a collection of data that produces valuable information for companies, agencies or organizations that can be used in the decision-making process. Prediction of graduation with data mining can be solved by classifying the data set. The C5.0 algorithm is an improvement algorithm from the C4.5 algorithm where the process is almost the same, only the C5.0 algorithm has advantages over the previous algorithm. The results of the C5.0 algorithm are in the form of a decision tree or a rule that is formed based on the entropy or gain value. The prediction process is carried out based on the classification of the C5.0 algorithm by using the attributes of Attendance Value, Assignment Value, UTS Value and UAS Value. The final result of the C5.0 algorithm classification process is a decision tree with rules in it. The performance of the C5.0 algorithm gets a high accuracy rate of 93.33%


2020 ◽  
Vol 4 (3) ◽  
pp. 569-575
Author(s):  
Dwi Meylitasari Tarigan ◽  
Dian Palupi Rini ◽  
Samsuryadi

Diabetes Mellitus (DM) is a disease caused by blood sugar level increased were higher than the maximum limit. Food consumed tends to contain uncontrolled sugar which could cause the drastic increase of blood sugar level. It is necessary to efforts, to increasing the public awareness to controlling blood sugar and the risks of increasing blood sugar level so as to determine of preventive and early detection measures One of used of data mining technique is information technology in the health sector which used a lot as a decision maker to predicting and diagnosing a several disease.  This research aims to optimizing the features on classification of the data mining with the C4.5 algorithm using Particle Swarm Optimization (PSO) to detect the blood sugar level in patient. The dataset used is the effect of physical activity to the Blood Sugar Level at H. Abdul Manan Simatupang Kisaran Regional Public Hospital.  The amount of dataset used is 42 record with 10 attributes.  The result of this research obtained that the Particle Swarm Optimization (PSO) may increasing the accuracy performance of C4.5 from 86% to 95%.  Whereas the evaluation result of the AUC Value increasing from 0,917 to 0,950. From those 10 attributes which are then selection with using PSO into 7 attributes used to determine the prediction of sugar level.  Therefore the Algorithm C4.5 using the Particle Swarm Optimization (PSO) may provide the best solution to the accuracy of detection blood sugar levels.


2011 ◽  
Vol 2 (1) ◽  
pp. 49-58
Author(s):  
Periasamy Vivekanandan ◽  
Raju Nedunchezhian

Genetic algorithm is a search technique purely based on natural evolution process. It is widely used by the data mining community for classification rule discovery in complex domains. During the learning process it makes several passes over the data set for determining the accuracy of the potential rules. Due to this characteristic it becomes an extremely I/O intensive slow process. It is particularly difficult to apply GA when the training data set becomes too large and not fully available. An incremental Genetic algorithm based on boosting phenomenon is proposed in this paper which constructs a weak ensemble of classifiers in a fast incremental manner and thus tries to reduce the learning cost considerably.


Author(s):  
Andri Wijaya ◽  
Abba Suganda Girsang

This  article  discusses  the  analysis  of  customer  loyalty  using  three  data  mining  methods:  C4.5,Naive Bayes, and Nearest Neighbor Algorithms and real-world  empirical  data.  The  data  contain  ten  attributes related to the customer loyalty and are obtained from a national  multimedia  company  in  Indonesia.  The  dataset contains 2269 records. The study also evaluates the effects of  the  size  of  the  training  data  to  the  accuracy  of  the classification.  The  results  suggest  that  C4.5  algorithm produces   highest classification   accuracy   at   the   order of  81%  followed  by  the  methods  of  Naive  Bayes  76% and  Nearest  Neighbor  55%.  In  addition,  the  numerical evaluation  also  suggests  that  the  proportion  of  80%  is optimal  for  the  training  set.


Sign in / Sign up

Export Citation Format

Share Document