scholarly journals Developed third iterative dichotomizer based on feature decisive values for educational data mining

Author(s):  
Saja Taha Ahmed ◽  
Rafah Al-Hamdani ◽  
Muayad Sadik Croock

<p><span>Recently, the decision trees have been adopted among the preeminent utilized classification models. They acquire their fame from their efficiency in predictive analytics, easy to interpret and implicitly perform feature selection. This latter perspective is one of essential significance in Educational Data Mining (EDM), in which selecting the most relevant features has a major impact on classification accuracy enhancement. <br /> The main contribution is to build a new multi-objective decision tree, which can be used for feature selection and classification. The proposed Decisive Decision Tree (DDT) is introduced and constructed based on a decisive feature value as a feature weight related to the target class label. The traditional Iterative Dichotomizer 3 (ID3) algorithm and the proposed DDT are compared using three datasets in terms of some ID3 issues, including logarithmic calculation complexity and multi-values features<em></em>selection. The results indicated that the proposed DDT outperforms the ID3 in the developing time. The accuracy of the classification is improved on the basis of 10-fold cross-validation for all datasets with the highest accuracy achieved by the proposed method is 92% for the student.por dataset and holdout validation for two datasets, i.e. Iraqi and Student-Math. The experiment also shows that the proposed DDT tends to select attributes that are important rather than multi-value. </span></p>

2021 ◽  
Vol 2 (4) ◽  
pp. 247-253
Author(s):  
Milyani Aritonang

The need for fertilizer at the Plant Protection Development Unit (UPPT) is uncertain depending on the demand of farmers, therefore it is necessary to predict fertilizer needs. There are five types of fertilizers predicted by the Plant Protection Development Unit (UPPT), including Urea fertilizer, ZA fertilizer, SP-36 fertilizer, NPK fertilizer, and Organic fertilizer, so fertilizer needs can be predicted. In predicting data mining on fertilizer needs using the ID3 algorithm. Where it works is calculating the value of entropy and gain to get the final result in the form of a tree to the decision and rule. Testing is done using the tanagra software. The results of the tests carried out on the tanagra application using the ID3 algorithm are in the form of a decision tree, while in the calculation the results obtained are in the form of a decision tree.


Author(s):  
Tyler Swanger ◽  
Kaitlyn Whitlock ◽  
Anthony Scime ◽  
Brendan P. Post

This chapter data mines the usage patterns of the ANGEL Learning Management System (LMS) at a comprehensive college. The data includes counts of all the features ANGEL offers its users for the Fall and Spring semesters of the academic years beginning in 2007 and 2008. Data mining techniques are applied to evaluate which LMS features are used most commonly and most effectively by instructors and students. Classification produces a decision tree which predicts the courses that will use the ANGEL system based on course specific attributes. The dataset undergoes association mining to discover the usage of one feature’s effect on the usage of another set of features. Finally, clustering the data identifies messages and files as the features most commonly used. These results can be used by this institution, as well as similar institutions, for decision making concerning feature selection and overall usefulness of LMS design, selection and implementation.


2019 ◽  
Vol 12 (2) ◽  
pp. 73
Author(s):  
Daniel David

Data mining adalah salah satu alternatif yang bisa dilakukan untuk melakukan penggalian informasi baru dari sejumlah data yang besar. Salah satu aliran data mining adalah Educational Data Mining (EDM). EDM adalah aliran data mining yang bergerak pada bidang pendidikan. Dengan memanfaatkan data-data yang berhubungan dengan pendidikan, proses data mining bisa dilakukan untuk menemukan informasi berguna untuk kemajuan dalam bidang pendidikan. Penelitian ini menggunakan EDM dengan tujuan untuk memanfaatkan data internal assessment dari dari masing-masing siswa sekolah dan melakukan prediksi terhadap hasil ujian akhir nasional siswa tersebut. Data mining ini menggunakan teknik klasifikasi dan metode Decision Tree C4.5. Selain itu akan digunakan juga metode penelitian deskriptif agar bisa memberikan hasil yang lebih akurat. Penelitian ini diharapkan bisa memberikan kontribusi dalam bentuk prediksi hasil ujian akhir nasional sehingga kedepannya bisa digunakan untuk siswa angkatan seterusnya.


2020 ◽  
Vol 10 (9) ◽  
pp. 3291
Author(s):  
Jesús F. Pérez-Gómez ◽  
Juana Canul-Reich ◽  
José Hernández-Torruco ◽  
Betania Hernández-Ocaña

Requiring only a few relevant characteristics from patients when diagnosing bacterial vaginosis is highly useful for physicians as it makes it less time consuming to collect these data. This would result in having a dataset of patients that can be more accurately diagnosed using only a subset of informative or relevant features in contrast to using the entire set of features. As such, this is a feature selection (FS) problem. In this work, decision tree and Relief algorithms were used as feature selectors. Experiments were conducted on a real dataset for bacterial vaginosis with 396 instances and 252 features/attributes. The dataset was obtained from universities located in Baltimore and Atlanta. The FS algorithms utilized feature rankings, from which the top fifteen features formed a new dataset that was used as input for both support vector machine (SVM) and logistic regression (LR) algorithms for classification. For performance evaluation, averages of 30 runs of 10-fold cross-validation were reported, along with balanced accuracy, sensitivity, and specificity as performance measures. A performance comparison of the results was made between using the total number of features against using the top fifteen. These results found similar attributes from our rankings compared to those reported in the literature. This study is part of ongoing research that is investigating a range of feature selection and classification methods.


2021 ◽  
Vol 10 (3) ◽  
pp. 121-127
Author(s):  
Bareen Haval ◽  
Karwan Jameel Abdulrahman ◽  
Araz Rajab

This article presents the results of connecting an educational data mining techniques to the academic performance of students. Three classification models (Decision Tree, Random Forest and Deep Learning) have been developed to analyze data sets and predict the performance of students. The projected submission of the three classificatory was calculated and matched. The academic history and data of the students from the Office of the Registrar were used to train the models. Our analysis aims to evaluate the results of students using various variables such as the student's grade. Data from (221) students with (9) different attributes were used. The results of this study are very important, provide a better understanding of student success assessments and stress the importance of data mining in education. The main purpose of this study is to show the student successful forecast using data mining techniques to improve academic programs. The results of this research indicate that the Decision Tree classifier overtakes two other classifiers by achieving a total prediction accuracy of 97%.


2016 ◽  
Vol 2 (2) ◽  
pp. 60
Author(s):  
Abidatul Izzah ◽  
Ratna Widyastuti

AbstrakPerguruan Tinggi merupakan salah satu institusi yang menyimpan data yang sangat informatif jika diolah secara baik. Prediksi kelulusan mahasiswa merupakan kasus di Perguruan Tinggi yang cukup banyak diteliti. Dengan mengetahui prediksi status kelulusan mahasiswa di tengah semester, dosen dapat mengantisipasi atau memberi perhatian khusus pada siswa yang diprediksi tidak lulus. Metode yang digunakan sangat bervariatif termasuk metode Fuzzy Inference System (FIS). Namun dalam implementasinya, proses pembangkitan rule fuzzy sering dilakukan secara random atau berdasarkan pemahaman pakar sehingga tidak merepresentasikan sebaran data. Oleh karena itu, dalam penelitian ini digunakan teknik Decision Tree (DT) untuk membangkitkan rule. Dari uraian tersebut, penelitian bertujuan untuk memprediksi kelulusan mata kuliah menggunakan hybrid FIS dan DT. Data yang digunakan dalam penelitian ini adalah data nilai Posttest, Tugas, Kuis, dan UTS dari 106 mahasiswa Politeknik Kediri pengikut mata kuliah Algoritma dan Struktur Data. Penelitian ini diawali dari membangkitkan 5 rule yang selanjutnya digunakan dalam inferensi. Tahap selanjutnya adalah implementasi FIS dengan tahapan fuzzifikasi, inferensi, dan defuzzifikasi. Hasil yang diperoleh adalah akurasi, sensitivitas, dan spesifisitas  masing-masing adalah 94.33%, 96.55%, dan 84.21%.Kata kunci: Decision Tree, Educational Data Mining, Fuzzy Inference System, Prediksi. AbstractCollege is an institution that holds very informative data if it mined properly. Prediction about student’s graduation is a common case that many discussed. Having the predictions of student’s graduation in the middle semester, lecturer will anticipate or give some special attention to students who would be not passed. The method used to prediction is very varied including Fuzzy Inference System (FIS). However, fuzzy rule process is often generated randomly or based on knowledge experts that not represent the data distribution. Therefore, in this study, we used a Decision Tree (DT) technique for generate the rules. So, the research aims to predict courses graduation using hybrid FIS and DT. Dataset used is the posttest score, tasks score, quizzes score, and middle test score from 106 students of the Polytechnic Kediri who took Algorithms and Data Structures. The research started by generating 5 rules by decision tree. The next is implementation of FIS that consist of fuzzification, inference, and defuzzification. The results show that the classifier give a good result in an accuracy, sensitivity, and specificity respectively was 94.33%, 96.55% and 84.21%.Keywords: Decision Tree, Educational Data Mining, Fuzzy Inference System, Prediction.


2018 ◽  
Vol 7 (2) ◽  
pp. 44-47
Author(s):  
Mudasir Ashraf ◽  
Majid Zaman ◽  
Muheet Ahmed

Educational data mining has illustrated an increasing demand for extracting and maneuvering data from academic backdrop, to generate prolific information which is indispensible for decision making. Therefore in this paper, an attempt has been made to deploy various data mining techniques including base and meta learning classifiers across our pedagogical dataset to foretell the performance of students. Among several contemporary ensemble approaches, researchers have practiced widespread learning classifiers viz. boosting to predict the performance of students. As exploitation of ensemble methods is considered to be significant phenomenon in classification and prediction mechanisms, therefore analogous method (boosting) has been applied across our pedagogical dataset. The entire results have been evaluated with 10-fold cross validation, once pedagogical dataset has been subjected to base classifiers including j48, random tree, naive bayes and knn. In addition, techniques such as oversampling (SMOTE) and undersampling (Spread subsampling) have been employed to further draw a comparison among ensemble classifiers and base classifiers. These methods were exploited with the key objective to observe any improvement in prediction accuracy of students.


Author(s):  
Maryam Zaffar ◽  
Manzoor Ahmad Hashmani ◽  
K.S. Savita ◽  
Syed Sajjad Hussain Rizvi ◽  
Mubashar Rehman

The Educational Data Mining (EDM) is a very vigorous area of Data Mining (DM), and it is helpful in predicting the performance of students. Student performance prediction is not only important for the student but also helpful for academic organization to detect the causes of success and failures of students. Furthermore, the features selected through the students’ performance prediction models helps in developing action plans for academic welfare. Feature selection can increase the prediction accuracy of the prediction model. In student performance prediction model, where every feature is very important, as a neglection of any important feature can cause the wrong development of academic action plans. Moreover, the feature selection is a very important step in the development of student performance prediction models. There are different types of feature selection algorithms. In this paper, Fast Correlation-Based Filter (FCBF) is selected as a feature selection algorithm. This paper is a step on the way to identifying the factors affecting the academic performance of the students. In this paper performance of FCBF is being evaluated on three different student’s datasets. The performance of FCBF is detected well on a student dataset with greater no of features.


2016 ◽  
Vol 7 (4) ◽  
Author(s):  
Mochammad Yusa ◽  
Ema Utami ◽  
Emha T. Luthfi

Abstract. Readmission is associated with quality measures on patients in hospitals. Different attributes related to diabetic patients such as medication, ethnicity, race, lifestyle, age, and others result in the calculation of quality care that tends to be complicated. Classification techniques of data mining can solve this problem. In this paper, the evaluation on three different classifiers, i.e. Decision Tree, k-Nearest Neighbor (k-NN), dan Naive Bayes with various settingparameter, is developed by using 10-Fold Cross Validation technique. The targets of parameter performance evaluated is based on term of Accuracy, Mean Absolute Error (MAE), dan Kappa Statistic. The selected dataset consists of 47 attributes and 49.735 records. The result shows that k-NN classifier with k=100 has a better performance in term of accuracy and Kappa Statistic, but Naive Bayes outperforms in term of MAE among other classifiers. Keywords: k-NN, naive bayes, diabetes, readmissionAbstrak. Proses Readmisi dikaitkan dengan perhitungan kualitas penanganan pasien di rumah sakit. Perbedaan atribut-atribut yang berhubungan dengan pasien diabetes proses medikasi, etnis, ras, gaya hidup, umur, dan lain-lain, mengakibatkan perhitungan kualitas cenderung rumit. Teknik klasifikasi data mining dapat menjadi solusi dalam perhitungan kualitas ini. Teknik klasifikasi merupakan salah satu teknik data mining yang perkembangannya cukup signifikan. Di dalam penelitian ini, model algoritma klasifikasi Decision Tree, k-Nearest Neighbor (k-NN), dan Naive Bayes dengan berbagai parameter setting akan dievaluasi performanya berdasarkan nilai performa Accuracy, Mean AbsoluteError (MAE), dan Kappa Statistik dengan metode 10-Fold Cross Validation. Dataset yang dievaluasi memiliki 47 atribut dengan 49.735 records. Hasil penelitian menunjukan bahwa performa accuracy, MAE, dan Kappa Statistik terbaik didapatkan dari Model Algoritma Naive Bayes.Kata Kunci: k-NN, naive bayes, diabetes, readmisi


Sign in / Sign up

Export Citation Format

Share Document