An Ingenious Methodology for the Collation of Existing Algorithms for the Prognosis of Student Performance

In this proposed research work we use a profound Data mining technique which is an automated procedure of discovering interesting patterns by means of comprehensible predictive models from large data sets by grouping them. Predicting a student's academic performance is very crucial especially for universities. Educational Data Mining (EDM) is an approach for extricating useful data that could possibly affect a firm. Nowadays student’s performance is swayed by a lot of aspects. These aspects might involve the academic performance of a student. This subject evaluates numerous factors probably suspected to alter a student’s empirical performance in scholastic, and discover a subjective design which classifies and forecast the student’s learning outcomes. The intention of this research is to conduct a case study on factors swayed by the student’s academic achievements and to dictate greater impact factors. In this paper we focus on the academic achievement evaluation on the basis of correct instances and incorrect instances by means of Naive Bayes and Random Forest algorithms. This paper intends to make a metaphorical assessment of Naive Bayes and random Forest classifier on student data and dictate the best algorithm.

Download Full-text

Algoritma Naïve Bayes Untuk Memprediksi Kredit Macet Pada Koperasi Simpan Pinjam

Jurnal Informatika Upgris ◽

10.26877/jiu.v4i2.2919 ◽

2019 ◽

Vol 4 (2) ◽

Author(s):

Diah Puspitasari ◽

Syifa Sintia Al Khautsar ◽

Wida Prima Mustika

Keyword(s):

Data Mining ◽

Predictive Value ◽

Naive Bayes ◽

False Negative ◽

False Negative Rate ◽

True Positive Rate ◽

Naïve Bayes ◽

Data Mining Technique ◽

Application Form ◽

Using Data

Cooperatives are a forum that can help people, especially small and medium-sized communities. Cooperatives play an important role in the economic growth of the community such as the price of basic commodities which are relatively cheap and there are also cooperatives that offer borrowing and storing money for the community. Constraints that have been felt by this cooperative are that borrowers find it difficult to repay loan installments, causing bad credit. Because the cooperative in conducting credit analysis is carried out in a personal manner, namely by filling out the loan application form along with the requirements and conducting a field survey. Therefore there is a need for an evaluation to be carried out in lending to borrowers. To minimize these problems, it is necessary to detect customer criteria that are used to predict bad loans and to determine whether or not the elites are eligible to take credit using data mining. The data mining technique used is classification with the Naive Bayes method. Based on testing the accuracy of the resulting model obtained accuracy level of 59%, sensitivity (True Positive Rate (TP Rate) or Recall) of 46.80%, specificity (False Negative Rate (FN Rate or Precision) of 69.81%, Positive Predictive Value (PPV) of 57.89%, and Negative Predictive Value (NPV) of 59.67%.

Download Full-text

Performance of Naïve Bayes, C4.5 and KNN using Breast Cancer, Iris and Hypothyroid Datasets

International Journal of Innovative Technology and Exploring Engineering - Special Issue ◽

10.35940/ijitee.c8795.019320 ◽

2020 ◽

Vol 9 (3) ◽

pp. 2193-2197

Keyword(s):

Breast Cancer ◽

Data Mining ◽

Nearest Neighbor ◽

Naive Bayes ◽

Naïve Bayes ◽

Specific Pattern ◽

K Nearest Neighbor ◽

Data Mining Technique ◽

Digital Format ◽

Tree Classifier

Data mining usually specifies the discovery of specific pattern or analysis of data from a large dataset. Classification is one of an efficient data mining technique, in which class the data are classified are already predefined using the existing datasets. The classification of medical records in terms of its symptoms using computerized method and storing the predicted information in the digital format is of great importance in the diagnosis of various diseases in the medical field. In this paper, finding the algorithm with highest accuracy range is concentrated so that a cost-effective algorithm can be found. Here the data mining classification algorithms are compared with their accuracy of finding exact data according to the diagnosis report and their execution rate to identify how fast the records are classified. The classification technique based algorithms used in this study are the Naive Bayes Classifier, the C4.5 tree classifier and the K-Nearest Neighbor (KNN) to predict which algorithm is the best suited for classifying any kind of medical dataset. Here the datasets such as Breast Cancer, Iris and Hypothyroid are used to predict which of the three algorithms is suitable for classifying the datasets with highest accuracy of finding the records of patients with the particular health problems. The experimental results represented in the form of table and graph shows the performance and the importance of Naïve Bayes, C4.5 and K-Nearest Neighbor algorithms. From the performance outcome of the three algorithms the C4.5 algorithm is a lot better than the Naïve Bayes and the K-Nearest Neighbor algorithm.

Download Full-text

Predicting heart ailment in patients with varying number of features using data mining techniques

International Journal of Informatics and Communication Technology (IJ-ICT) ◽

10.11591/ijict.v8i1.pp56-62 ◽

2019 ◽

Vol 8 (1) ◽

pp. 56

Author(s):

T R Stella Mary ◽

Shoney Sebastian

Keyword(s):

Data Mining ◽

Heart Disease ◽

Random Forest ◽

Naive Bayes ◽

Heart Diseases ◽

Naïve Bayes ◽

Bayes Classifier ◽

Data Mining Techniques ◽

Using Data ◽

Almost All

Data mining can be defined as a process of extracting unknown, verifiable and possibly helpful data from information. Among the various ailments, heart ailment is one of the primary reason behind death of individuals around the globe, hence in order to curb this, a detailed analysis is done using Data Mining. Many a times we limit ourselves with minimal attributes that are required to predict a patient with heart disease. By doing so we are missing on a lot of important attributes that are main causes for heart diseases. Hence, this research aims at considering almost all the important features affecting heart disease and performs the analysis step by step with minimal to maximum set of attributes using Data Mining techniques to predict heart ailments. The various classification methods used are Naïve Bayes classifier, Random Forest and Random Tree which are applied on three datasets with different number of attributes but with a common class label. From the analysis performed, it shows that there is a gradual increase in prediction accuracies with the increase in the attributes irrespective of the classifiers used and Naïve Bayes and Random Forest algorithms comparatively outperforms with these sets of data.

Download Full-text

Predicting Heart Ailment in Patients with Varying number of Features using Data Mining Techniques

International Journal of Electrical and Computer Engineering (IJECE) ◽

10.11591/ijece.v9i4.pp2675-2681 ◽

2019 ◽

Vol 9 (4) ◽

pp. 2675

Author(s):

T R Stella Mary ◽

Shoney Sebastian

Keyword(s):

Data Mining ◽

Heart Disease ◽

Random Forest ◽

Naive Bayes ◽

Heart Diseases ◽

Naïve Bayes ◽

Bayes Classifier ◽

Data Mining Techniques ◽

Using Data ◽

Almost All

Data mining can be defined as a process of extracting unknown, verifiable and possibly helpful data from information. Among the various ailments, heart ailment is one of the primary reason behind death of individuals around the globe, hence in order to curb this, a detailed analysis is done using Data Mining. Many a times we limit ourselves with minimal attributes that are required to predict a patient with heart disease. By doing so we are missing on a lot of important attributes that are main causes for heart diseases. Hence, this research aims at considering almost all the important features affecting heart disease and performs the analysis step by step with minimal to maximum set of attributes using Data Mining techniques to predict heart ailments. The various classification methods used are Naïve Bayes classifier, Random Forest and Random Tree which are applied on three datasets with different number of attributes but with a common class label. From the analysis performed, it shows that there is a gradual increase in prediction accuracies with the increase in the attributes irrespective of the classifiers used and Naïve Bayes and Random Forest algorithms comparatively outperforms with these sets of data.

Download Full-text

Optimasi Naive Bayes Menggunakan Algoritma Genetika Sebagai Seleksi Fitur Untuk Memprediksi Performa Siswa

Jurnal Ilmiah Teknologi Informasi Asia ◽

10.32815/jitika.v14i1.400 ◽

2020 ◽

Vol 14 (1) ◽

pp. 31

Author(s):

Suhendro Busono

Keyword(s):

Data Mining ◽

Genetic Algorithm ◽

Student Performance ◽

Parent Education ◽

Naive Bayes ◽

Electronic Media ◽

Naïve Bayes ◽

Parent Support ◽

Long Time ◽

Parent Relation

In this globalisation era, the morality tenegers decrease.This fenomena can be seen on mass or electronic media. Mass or electronic media inform that the negatif case often happend on teenegers community. Negatif case such as brawl, drug, gambling, rape, disobidience to parents, and others. The cause of negatif case is not from himself or hisself but it is triggered by bad customs. The less of parent attention, the low of parent relation quality can inflict bad customs from children. Parent education, parent job, the parent support of education can influence children mainset. How long time children study, how long time children have sparetime, how long time children make friend, and how long time children acess internet can influence mainset of children. The customs of children explained on sentences before, can be measured by science and tecnology. Data Mining that is branch of computer science can measure how much quality children or adult perform based on custom framer indicator. In the last research of student performance using Naive Bayes Methode, the number of attribute is too much (33 attribut) and the score of accuracy is 91.15 %. In this research, the researcher optimize attributes of the last research using Genetic Algorithm. Genetic Algorithm can choose relevant attribut. The choice of relevant attributes can increase score of accuracy. The score of accuracy after using Genetic Algorithm is 97.21 %.

Download Full-text

Komparasi Algoritma Klasifikasi Data Mining untuk Memprediksi Tingkat Kematian Dini Kanker dengan Dataset Early Death Cancer

JOINTECS (Journal of Information Technology and Computer Science) ◽

10.31328/jointecs.v4i2.1008 ◽

2019 ◽

Vol 4 (2) ◽

pp. 63

Author(s):

Panny Agustia Rahayuningsih

Keyword(s):

Neural Network ◽

Data Mining ◽

Random Forest ◽

Cross Validation ◽

Naive Bayes ◽

Early Death ◽

Naïve Bayes ◽

T Test ◽

Fold Cross Validation

Penyakit Kanker merupakan sepuluh besar penyakit pembunuh di dunia. Kanker merupakan penyakit yang ganas dan sulit disembuhkan jika penyebarannya sudah terlalu luas. Akan tetapi, pendeteksian sel kanker sedini mungkin dapat mengurangi resiko kematian. Penelitian ini bertujuan untuk memprediksikan tingkat kematian dini kanker pada penduduk Eropa dengan menggunakan 5algoritma klasifikasi yaitu: Desecion Tree, Naïve Bayes, k-Nearset Neighbour, Random Forest dan Neural Network dari algoritma tersebut algoritma mana yang dianggap paling baik untuk penelitian ini. Pengujian dilakukan dengan beberapa tahapan penelitian antara lain: dataset (pengumpulan data), pengolahan data awal, metode yang diusulkan, pengujian metode menggunakan 10-fold cross validation, evaluasi hasil dan uji beda t-test. Nilai alpha yang digunakan adalah 0.05. jika probabilitasnya >0.05 maka H0 diterima. Sedangkan jika probabilitasnya <0.05 maka Ho ditolak.Hasil dari penelitian yang mendapatkan performe terbaik dengan nilai akurasi sebesar 98,35% adalah algoritma Neural Network. Sedangkan, hasil penelitian menggunakan uji t-test algoritma dengan model terbaik yaitu: algoritma Random Forest dan Neural Network, algoritma Naïve Bayes lumanyan baik, algoritma Desecion Tree cukup baik dan algoritma yang kurang baik adalah algoritma K-Nearset Neighbour (K-NN).

Download Full-text

Performance Analysis Based on Data Mining Technique in Predicting the Diabetic Disease - Decision tree and Naïve Bayes

2019 1st International Conference on Advances in Information Technology (ICAIT) ◽

10.1109/icait47043.2019.8987382 ◽

2019 ◽

Author(s):

Karthikeyan S. M ◽

Gopinath C B ◽

Chethan P. J ◽

Manikanta J

Keyword(s):

Data Mining ◽

Performance Analysis ◽

Decision Tree ◽

Naive Bayes ◽

Naïve Bayes ◽

Data Mining Technique ◽

Mining Technique

Download Full-text

Adatbányászati technikák alkalmazása magyar vállalkozások adatait tartalmazó adatbázison Microsoft Excel 2007-ben

Jelenkori Társadalmi és Gazdasági Folyamatok ◽

10.14232/jtgf.2010.1-2.229-233 ◽

2010 ◽

Vol 5 (1-2) ◽

pp. 229-233

Author(s):

György Hampel ◽

Zoltán Fabulya ◽

Elemérné Nagy

Keyword(s):

Data Mining ◽

Naive Bayes ◽

Naïve Bayes ◽

The Other ◽

Annual Income ◽

Data Mining Technique ◽

Microsoft Excel ◽

Mining Technique ◽

Bayes Algorithm ◽

Main Activity

Using a simple data mining technique, the Analyze Key Influencers, in Excel 2007 Data Mining Add-ins, we searched for relationship among the seat (county and town), the form of business, the main activity, the number of employees and the annual income of the Hungarian companies. This technique uses the Naive Bayes algorithm. According to the used method the seat has no influencers. Most of the main activities have no influencers, but some activities (82 out of 495) have relationship with the other criteria, mainly with the form of business. The form of business (all 30 categories), the number of employees (17 of 18 categories) and the annual income (all 9 categories) are each others key influencers. Cramer's association was used to check the results of the data mining. The Cramer contin-gency coefficient showed similar results as the data mining, but the results also indicated that the strength of the association was less than moderate in all cases. The highest associa-tion were between the annual income and the number of employees (0.46, moderate asso-ciation), the main activity and form of business (0.36, moderate association) and the annual income and the form of business (0.27, low association).

Download Full-text

Prediksi Ketepatan Kelulusan Mahasiswa Diploma dengan Komparasi Algoritma Klasifikasi

Jurnal Sistem dan Teknologi Informasi (JustIN) ◽

10.26418/justin.v7i3.33316 ◽

2019 ◽

Vol 7 (3) ◽

pp. 202

Author(s):

Muhammad Sony Maulana ◽

Raja Sabarudin ◽

Wahyu Nugraha

Keyword(s):

Data Mining ◽

Random Forest ◽

Decision Tree ◽

Naive Bayes ◽

Naïve Bayes ◽

Rule Induction ◽

T Test

AMIK BSI Pontianak merupakan salah satu perguruan tinggi swasta yang memiliki jumlah mahasiswa yang banyak, namun dalam perjalanannya masih terdapat permasalahan yang setiap tahun nya terjadi yaitu permasalahan jumlah kelulusan mahasiswa yang tepat waktu dan terlambat. Jumlah mahasiswa yang lulus tepat waktu menjadi indikator efektifitas dari sebuah perguruan tinggi baik negeri dan swasta. Perguruan tinggi perlu mendeteksi perilaku dari mahasiswa aktif sehingga dapat dilihat faktor yang menyebabkan mahasiswa tidak lulus tepat waktu. Pada penelitian ini, akan mengkomparasikan atau membandingkan 5 metode data mining untuk menentukan metode mana yang paling optimal dalam menentukan ketepatan kelulusan mahasiswa dengan teknik pengujian T-Test, metode yang dibandingkan adalah metode Decision Tree, Naive Bayes, K-NN, Rule Induction, dan Random Forest. Hasil dari penelitian ini menghasilkan bahwa algoritma Rule Induction dan C4.5 adalah metode yang paling optimal performanya dalam menentukan ketepatan kelulusan mahasiswa diploma AMIK BSI Pontianak

Download Full-text

Supervised data mining approach for predicting student performance

Indonesian Journal of Electrical Engineering and Computer Science ◽

10.11591/ijeecs.v16.i3.pp1584-1592 ◽

2019 ◽

Vol 16 (3) ◽

pp. 1584 ◽

Cited By ~ 1

Author(s):

Wan Fairos Wan Yaacob ◽

Syerina Azlin Md Nasir ◽

Wan Faizah Wan Yaacob ◽

Norafefah Mohd Sobri

Keyword(s):

Data Mining ◽

Academic Success ◽

Student Performance ◽

Naive Bayes ◽

Naïve Bayes ◽

Classification Algorithm ◽

K Nearest Neighbor ◽

Data Mining Approach ◽

Accuracy Measure ◽

Student’S Performance

Data mining approach has been successfully implemented in higher education and emerge as an interesting area in educational data mining research. The approach is intended for identification and extraction of new and potentially valuable knowledge from the data. Predictive model developed using supervised data mining approach can derive conclusion on students' academic success. The ability to predict student’s performance can be beneficial for innovation in modern educational systems. The main objective of this paper is to develop predictive models using classification algorithm to predict student’s performance at selected university in Malaysia. The prediction model developed can be used to identify the most important attributes in the data. Several predictive modelling techniques of K-Nearest Neighbor, Naïve Bayes, Decision Tree and Logistic Regression Model models were used to predict student’s performance whether excellent or non-excellent. Based on accuracy measure, precision, recall and ROC curve, results show that the Naïve Bayes outperform other classification algorithm. The Naïve Bayes reveals that the most significant factors contributing to prediction of excellent students is when the student scores A+ and A in Multivariate Analysis; A+, A and A- in SAS Programming and A, A- and B+ in ITS 472.

Download Full-text