Comparative Study of Data Mining Classifiers for Students’ Academic Performance

Klasifikasi merupakan metode data mining yang berfungsi untuk mengatur dan mengkategorikan data pada kelas yang berbeda-beda. Penelitian ini bertujuan untuk membandingkan dan menentukan algoritma nonparametrik terbaik dalam pengklasifikasian citra wajah. Dalam proses pengklasifikasian, penelitian ini menggunakan algoritma klasifikasi nonparametrik yaitu k-Nearest Neighbor (kNN), Support Vector Machine (SVM), Decision Tree, dan AdaBoost Untuk mengklasifikasikan citra wajah penduduk Indonesia yang berasal dari suku Batak, Dayak, Jawa, Melayu, dan Tionghoa. Penelitian ini menggunakan Orange Data Mining Tool sebagai alat bantu untuk melakukan proses data mining. Dari hasil pengklasifikasian dengan menerapkan algoritma k-Nearest Neigbor, Support Vector Machine, Decision Tree, dan AdaBoost, SVM memberikan nilai akurasi yang lebih baik dibanding algoritma lainnya. Rata-rata nilai precision keempat algoritma tersebut berturut-turut adalah Support Vector Machine 37.5%, diikuti oleh algoritma k-Nearest Neighbor 31.55%, AdaBoost 30.25%, dan untuk Decision Tree 29.75%.

Download Full-text

Identification of Models-Decision Tree and Random Forest Classifier using Rattle on Diabetes Disease

International Journal of Innovative Technology and Exploring Engineering - Special Issue ◽

10.35940/ijitee.i1033.0789s219 ◽

2019 ◽

Vol 8 (9S2) ◽

pp. 172-176

Keyword(s):

Data Mining ◽

Random Forest ◽

Decision Tree ◽

Forest Tree ◽

Medical Expert ◽

Data Mining Tool ◽

Random Forest Tree ◽

Mining Tool ◽

The Many ◽

The Given

Diabetes is the disease which is growing now a days in human body and there are a number of patient who are suffering by this diabetes in the world. The data related to medical area is very huge which is related to the many disease. So the first thing is that we have to choose a mining tool which give best result for the given databases. Because, this medical data is statistical and most of the researchers using this type of data. Data mining tool is used for the extracting better result in accuracy for the diabetes data base. By the data mining techniques the medical expert and researchers analyze the result and provide the best treatment for this disease. In this paper we are using diabetes data and apply it on the Rattle, an open source tool of data mining and perform two classification methods decision tree and random forest tree for classify the data and show that which classification algorithm is best for diabetes datase

Download Full-text

Appraisal of the Classification Technique in Data Mining of Student Performance using J48 Decision Tree, K-Nearest Neighbor and Multilayer Perceptron Algorithms

International Journal of Computer Applications ◽

10.5120/ijca2018916751 ◽

2018 ◽

Vol 179 (33) ◽

pp. 39-46 ◽

Cited By ~ 1

Author(s):

Faiza Umar ◽

Najim Ussiph

Keyword(s):

Data Mining ◽

Decision Tree ◽

Student Performance ◽

Multilayer Perceptron ◽

Nearest Neighbor ◽

K Nearest Neighbor ◽

Classification Technique ◽

J48 Decision Tree

Download Full-text

Z - CRIME: A data mining tool for the detection of suspicious criminal activities based on decision tree

2014 International Conference on Data Mining and Intelligent Computing (ICDMIC) ◽

10.1109/icdmic.2014.6954268 ◽

2014 ◽

Cited By ~ 7

Author(s):

Mugdha Sharma

Keyword(s):

Data Mining ◽

Decision Tree ◽

Data Mining Tool ◽

Mining Tool

Download Full-text

Application of Decision Tree as a Data Mining Tool in a Manufacturing System

Selected Readings on Database Technologies and Applications ◽

10.4018/978-1-60566-098-1.ch011 ◽

2011 ◽

pp. 234-251

Author(s):

S. A. Oke

Keyword(s):

Data Mining ◽

Decision Making ◽

Decision Tree ◽

Manufacturing Systems ◽

Manufacturing System ◽

Research Activity ◽

Data Mining Tool ◽

Mining Tool ◽

Classification Prediction ◽

Effective Decision Making

This work demonstrates the application of decision tree, a data mining tool, in the manufacturing system. Data mining has the capability for classification, prediction, estimation, and pattern recognition by using manufacturing databases. Databases of manufacturing systems contain significant information for decision making, which could be properly revealed with the application of appropriate data mining techniques. Decision trees are employed for identifying valuable information in manufacturing databases. Practically, industrial managers would be able to make better use of manufacturing data at little or no extra investment in data manipulation cost. The work shows that it is valuable for managers to mine data for better and more effective decision making. This work is therefore new in that it is the first time that proper documentation would be made in the direction of the current research activity.

Download Full-text

Application of Decision Tree as a Data Mining Tool in a Manufacturing System

Database Technologies ◽

10.4018/978-1-60566-058-5.ch054 ◽

2009 ◽

pp. 940-955

Author(s):

S. A. Oke

Keyword(s):

Data Mining ◽

Decision Making ◽

Decision Tree ◽

Manufacturing Systems ◽

Manufacturing System ◽

Research Activity ◽

Data Mining Tool ◽

Mining Tool ◽

Classification Prediction ◽

Effective Decision Making

This work demonstrates the application of decision tree, a data mining tool, in the manufacturing system. Data mining has the capability for classification, prediction, estimation, and pattern recognition by using manufacturing databases. Databases of manufacturing systems contain significant information for decision making, which could be properly revealed with the application of appropriate data mining techniques. Decision trees are employed for identifying valuable information in manufacturing databases. Practically, industrial managers would be able to make better use of manufacturing data at little or no extra investment in data manipulation cost. The work shows that it is valuable for managers to mine data for better and more effective decision making. This work is therefore new in that it is the first time that proper documentation would be made in the direction of the current research activity.

Download Full-text

Application of Decision Tree as a Data mining Tool in a Manufacturing System

Intelligent Databases ◽

10.4018/978-1-59904-120-9.ch006 ◽

2011 ◽

pp. 117-136

Author(s):

S.A. Oke

Keyword(s):

Data Mining ◽

Decision Making ◽

Decision Tree ◽

Manufacturing Systems ◽

Manufacturing System ◽

Research Activity ◽

Data Mining Tool ◽

Mining Tool ◽

Classification Prediction ◽

Effective Decision Making

This work demonstrates the application of decision tree, a data mining tool, in the manufacturing system. Data mining has the capability for classification, prediction, estimation, and pattern recognition by using manufacturing databases. Databases of manufacturing systems contain significant information for decision making, which could be properly revealed with the application of appropriate data mining techniques. Decision trees are employed for identifying valuable information in manufacturing databases. Practically, industrial managers would be able to make better use of manufacturing data at little or no extra investment in data manipulation cost. The work shows that it is valuable for managers to mine data for better and more effective decision making. This work is therefore new in that it is the first time that proper documentation would be made in the direction of the current research activity.

Download Full-text

Using Decision Tree Classifier for Analyzing Students’ Activities

JITA - Journal of Information Technology and Applications (Banja Luka) - APEIRON ◽

10.7251/jit1302087m ◽

2013 ◽

Vol 6 (2) ◽

Cited By ~ 1

Author(s):

Snježana Milinković ◽

Mirjana Maksimović

Keyword(s):

Data Mining ◽

Data Analysis ◽

Decision Tree ◽

Electrical Engineering ◽

Decision Tree Classifier ◽

Final Exam ◽

Data Mining Tool ◽

Tree Classifier ◽

Mining Tool

In this paper students’ activities data analysis in the course Introduction to programming at Faculty of Electrical Engineering in East Sarajevo is performed. Using the data that are stored in the Moodle database combined with manually collected data, the model was developed to predict students’ performance in successfully passing the final exam. The goal was to identify variables that could help teachers in predicting students’ performance and making specific recommendations for improving individual activities that could directly influence final exam successful passing. The model was created using decision tree classifier and experiments were performed using the WEKA data mining tool. The effect of input attributes on the model performances was analyzed and applying appropriate techniques a higher accuracy of the generated model was achieved.

Download Full-text

Penanganan Ketidakseimbangan Data pada Prediksi Customer Churn Menggunakan Kombinasi SMOTE dan Boosting

IJCIT (Indonesian Journal on Computer and Information Technology) ◽

10.31294/ijcit.v6i1.9545 ◽

2021 ◽

Vol 6 (1) ◽

Author(s):

Nana Suryana ◽

Pratiwi Pratiwi ◽

Rizki Tri Prasetio

Keyword(s):

Data Mining ◽

Deep Learning ◽

Random Forest ◽

Decision Tree ◽

Nearest Neighbor ◽

Naive Bayes ◽

Naïve Bayes ◽

K Nearest Neighbor ◽

Customer Churn ◽

Number Of Customers

Industri telekomunikasi menghadapi persaingan yang ketat antara penyedia layanan (service provider). Persaingan ini mengakibatkan customer churn atau berpindahnya pelanggan dari satu layanan ke layanan lain. Customer churn menjadi masalah utama karena dapat mempengaruhi pendapatan perusahaan, profitabilitas, serta kelangsungan hidup perusahaan. Oleh karena itu, mengetahui pelanggan yang akan melakukan churn secara dini menjadi salah satu cara yang cukup efektif dilakukan, karena dapat membantu perusahaan dalam membuat rencana yang efektif untuk tetap mempertahankan pelanggannya. Jumlah pelanggan yang mengundurkan diri dari layanannya saat ini biasanya dimiliki perusahaan dalam jumlah yang sedikit. Kondisi kekurangan data ini menyebabkan kesulitan dalam memprediksi customer churn. Tujuan umum dari penelitian ini adalah memprediksi pelanggan yang akan berpindah ke layanan lain atau mengundurkan diri dari layanannya saat ini. Sementara tujuan khusus penelitian Penelitian ini berusaha menangani ketidakseimbangan data dalam prediksi customer churn menggunakan optimasi pada level data melalui metode sampling yaitu Synthetic Minority Over Sampling. Kemudian dikombinasikan dengan optimasi level algoritma melalui pendekatan teknik Boosting. Pada penelitian beberapa algoritma prediksi seperti random forest, naïve bayes, decision tree, k-nearest neighbor dan deep learning yang akan diimplementasikan untuk mengetahui algoritma yang paling baik setelah dilakukan optimasi menggunakan SMOTE dan Boosting. Metode penelitian yang digunakan pada penelitian ini adalah CRISP-DM, yang merupakan kerangka penelitian data mining untuk penelitian lintas industri. Hasil penelitian ini menunjukan bahwa algoritma random forest merupakan algoritma yang menghasilkan akurasi paling optimal setelah dioptimasi menggunakan SMOTE dan Boosting dengan hasil akurasi 89,19%. The telecommunications industry faces stiff competition between service providers. This competition results in customer churn. Customer churn is a major problem because it can affect company revenue, profitability, survival, and service quality of the company. Therefore, knowing which customers will churn in the future early is one of the most effective ways to do it, because it can help companies make an effective plan to keep their customers. The number of customers who withdrew from its current services is usually owned by a small number. This lack of data causes difficulties in predicting customer churn. This problem then becomes a challenging issue in machine learning. The general purpose of this research is to predict customers who will churn. While the specific purpose of this research is to try to deal with data imbalances in predicting customer churn using optimization at the data level through the sampling method, namely Synthetic Minority Over Sampling (SMOTE). Then combined with algorithm level optimization through the Boosting technique approach. In this study, several prediction algorithms like the random forest, naïve Bayes, decision tree, k-nearest neighbor, and deep learning will be implemented to find out the best algorithm after optimization using SMOTE and Boosting. The method used in this study is CRISP-DM, which is a data mining research framework for cross-industry research. The results of this study indicate that the random forest algorithm is an algorithm that produces the most optimal accuracy after being optimized using SMOTE and Boosting with an accuracy of 89.19%.

Download Full-text

Applicability of Traditional Classification Techniques on Educational Data

International Journal of Recent Technology and Engineering - 2 ◽

10.35940/ijrte.e6149.038620 ◽

2020 ◽

Vol 8 (6) ◽

pp. 1672-1677

Keyword(s):

Data Mining ◽

Pilot Study ◽

Decision Tree ◽

Student Performance ◽

Performance Prediction ◽

Nearest Neighbor ◽

Class Imbalance ◽

Classification Techniques ◽

Statistical Measures ◽

Traditional Classification

Student performance prediction and analysis is an essential part of higher educational institutions, which helps in overall betterment of the educational system. Various traditional Data Mining (DM) techniques like Regression, Classification, etc. are prominently utilized for analyzing the data coming from educational settings. The usage of DM in the area of academics is called Educational Data Mining (EDM). The current pilot study aims to determine the applicability of these standalone classification techniques namely; Decision Tree, BayesNet, Nearest Neighbor, Rule-Based, and Random Forest (RF). The present pilot study uses the WEKA tool to implement traditional classification techniques on a standard dataset containing student academic information and background. The paper also implements feature selection to identify the high influential features from the dataset. It helps in reducing the dimensionality of the dataset as well as enhancing the accuracy of the classifier. The results of classifiers are compared on basis of standard statistical measures like Mean Absolute Error (MAE), Root Mean Square Error (RMSE), and Kappa, etc. The results show the applicability of classification algorithms for student performance prediction which will help under-achievers and struggling students to improve. It is found the output that, J48 algorithm of the Decision tree gave the best results. Further, it is deduced from the comparative analysis that individual classifiers give different accuracy on the same dataset due to class imbalance in a multiclass dataset.

Download Full-text