Graduation Prediction System On Students Using C4.5 Algorithm

Bumigora University College there are several things that are not balanced between the entry and exit of students who have completed their studies. Students who enter in large numbers, but students who graduate on time below the specified standards. As result, there was a huge accumulation of students in each graduation period. One solution to overcome the problem above needs a data mining based system in monitoring or utilizing student development in predicting graduation using the C4.5 algorithm. The stages of this research began with problem analysis, data collection, data requirement analysis, data design, coding, and testing. The results of this study are the implementation of the C4.5 algorithm for predicting student graduation on time or not. The data used is the data of students who have graduated from 2010 to 2012. The level of acceptance generated using the confusion matrix is 93,103% accuracy using 163 training data and 29 testing data or 85% training data and 15% testing data. The results of research and testing that has been done, C4.5 algorithm is very suitable to be used in student graduation prediction.

Download Full-text

MENGANALISIS KEMUNGKINAN KETERLAMBATAN PEMBAYARAN SPP DENGAN ALGORITMA C4.5 (STUDI KASUS POLITEKNIK TEDC BANDUNG)

Jurnal Techno Nusa Mandiri ◽

10.33480/techno.v16i2.659 ◽

2019 ◽

Vol 16 (2) ◽

pp. 93-98

Author(s):

Tri Herdiawan Apandi ◽

Roby Bayu Maulana ◽

Rian Piarna ◽

Dwi Vernanda

Keyword(s):

Confusion Matrix ◽

Total Sample ◽

Training Data ◽

Data Partition ◽

Tuition Fees ◽

Testing Data ◽

C4.5 Algorithm ◽

Decisive Action ◽

The Many ◽

Comparison Of The Results

Payment of tuition as one of the sources of funds, plays an important role in the sustainability of the operations of higher education. The problem that arises is that students are not often late to make payments in a timely manner. One of the factors causing the many cases of late payment of tuition fees due to lack of policy and decisive action on the part of the campus when students are late in making payments, besides the factors of parents and students also have an influence on the delay. The purpose of this study is to classify students who are late and timely in making SPP payments using the C4.5 algorithm. From the total sample used then divided into 4 partitions, partition 1 for 90% training data and 10% testing data, partition 2 for 80% training data and 20% testing data, and partition 3 for 70% training data and 30% testing data , and partition 4 for 60% training data and 40% testing data. The classification results of the C4.5 algorithm are evaluated and validated with cross validation and confusion matrix to determine the accuracy of the C4.5 algorithm in predicting late SPP payments. Based on the comparison of the results of evaluations and validations conducted, it shows that data partition 2 has a better level of accuracy than the other partitions, which is 75%. Keywords: Data Mining, Decision Tree (C4.5), SPP.

Download Full-text

Prediksi Kelulusan Mahasiswa Stikom Tunas Bangsa Prodi Sistem Informasi Dengan Menggunakan Algoritma C4.5

BRAHMANA: Jurnal Penerapan Kecerdasan Buatan ◽

10.30645/brahmana.v2i2.71 ◽

2021 ◽

Vol 2 (2) ◽

pp. 97-106

Author(s):

Lydia Yohana Lumban Gaol ◽

M. Safii ◽

Dedi Suhendro

Keyword(s):

Data Mining ◽

Information Systems ◽

Training Data ◽

Assessment Process ◽

Study Program ◽

Testing Data ◽

C4.5 Algorithm ◽

Student Graduation ◽

The University ◽

Level Student

Graduation is an important element in an accreditation assessment process of an institution or university. It is important to find out information about the predictions of student graduation in the Information Systems Study Program at STIKOM Tunas Bangsa Pematangsiantar, so that students who cannot graduate on time can be identified earlier. The application of data mining can be used to predict student graduation. Method that often used to predict student graduation is classification method. This research using C4.5 Algorithm. Data that used as training data are from alumni of Information Systems Study Program at STIKOM Tunas Bangsa and the final level student data of Information Systems Study Program at STIKOM Tunas Bangsa used as testing data. Through this research, it is expected that the results can provide information on predictions of student graduation on time and as a suggestion for the university in making good decisions for improvement in the future.

Download Full-text

Multi Class Data Classification to Improve Accuracy in Sentiment Analysis using Machine Learning

International Journal for Research in Applied Science and Engineering Technology ◽

10.22214/ijraset.2021.35291 ◽

2021 ◽

Vol 9 (VI) ◽

pp. 1457-1461

Author(s):

Daram Vishnu

Keyword(s):

Machine Learning ◽

Sentiment Analysis ◽

Confusion Matrix ◽

Training Data ◽

Natural Languages ◽

Parts Of Speech ◽

Testing Data ◽

Improve Accuracy ◽

Textual Form ◽

Speech Tagging

Sentiment analysis means classifying a text into different emotional classes. These days most of the sentiment analysis techniques divide the text into either binary or ternary classification in this paper we are classifying the movie reviews into 5 classes. Multi class sentiment analysis is a technique which can be used to know the exact sentiment of a review not just polarity of a given textual statement from positive to negative. So that one can know the precise sentiment of a review . Multi class sentiment analysis has always been a challenging task as natural languages are difficult to represent mathematically. The number of features are also generally large which requires huge computational power so to reduce the number of features we will use parts-of-speech tagging using textblob to extract the important features. Sentiment analysis is done using machine learning, where it requires training data and testing data to train a model. Various kinds of models are trained and tested at last one model is selected based on its accuracy and confusion matrix. It is important to analyze the reviews in textual form because large amount of reviews is present all over the web. Analyzing textual reviews can help the firms that are trying to find out the response of their products in the market. In this paper sentiment analysis is demonstrated by analyzing the movie reviews, reviews are taken from IMDB website.

Download Full-text

DCT Untuk Ekstraksi Fitur Berbasis GLCM Pada Identifikasi Batik Menggunakan K-NN

Jambura Journal of Electrical and Electronics Engineering ◽

10.37905/jjeee.v3i1.7113 ◽

2021 ◽

Vol 3 (1) ◽

pp. 1-6

Author(s):

Zulfrianto Yusrin Lamasigi

Keyword(s):

Feature Extraction ◽

Discrete Cosine Transform ◽

Confusion Matrix ◽

Training Data ◽

Gray Level ◽

K Nearest Neighbor ◽

Cosine Transform ◽

A Value ◽

Testing Data ◽

Occurrence Matrix

Batik merupakan kain yang dibuat khusus, batik sendiri terbilang unik karena memiliki motif tertentu yang dibuat berdasarkan unsur budaya dari daerah asal batik itu dibuat. setiap motif dan warna batik berbeda-beda sehingga sulit untuk dikenali asal dari motir batik itu sendiri. penelitian ini bertujuan untuk meningkatkan hasil ektraksi fitur pada identifikasi motif batik. metode yang digunakan dalam penelitian ini adalah Discrete Cosine Transform bertujuan untuk meningkatkan hasil ektraksi fitur Gray Level Co-Occurrence Matrix untuk mendapatkan hasil akurasi identifikasi motif batik yang lebih baik, sedangkan untuk mengetahui nilai kedekatan antara data training dengan data testing citra batik akan menggunakan K-Nearest Neighbour berdasarkan nilai ekstraksi fitur yang diperoleh. dalam eksperimen ini dilakukan 4 kali percobaan berdasarkan sudut 0°, 45°, 90°, dan 135° pada nilai k=1, 3, 5, 7, dan 9. sementara itu, untuk menghitung tingkat akurasi dari klasifikasi KNN akan menggunakan confusion matrix. Dari uji coba yang di lakukan dengan menggunakan jumalah data training sebanyak 602 citra dan data testing 344 citra terhadap semua kelas berdasarkan sudut 0°, 45°, 90°, dan 135° pada nilai k=1, 3, 5, , dan 9 akurasi tertinggi yang diperoleh DCT-GLCM ada pada sudut 135° dengan nilai k=3 sebesar 84,88% dan yang paling rendah ada pada sudut 0° dengan nilai k=7 dan 9 sebesar 41,86%. Sedangkan hasil uji dengan hanya mennggunakan GLCM akurasi tertinggi ada pada sudut 135° dengan nilai k=1 sebesar 77,90% dan yang paling rendah ada pada sudut 90° dengan nilai k=7 sebesar 40,69%. Dari hasil uji coba yang dilakukan menunjukkan bahwah DCT bekerja dengan baik untuk meningkatkan hasil ekstraksi fitur GLCM yang dibuktikan dengan hasil rata-rata akurasi yang diperoleh.Batik is a specially made cloth, batik itself is unique because it has certain motifs that are made based on cultural elements from the area where the batik was made. each batik motif and color is different so it is difficult to identify the origin of the batik motir itself. This study aims to improve the feature extraction results in the identification of batik motifs. The method used in this research is Discrete Cosine Transform, which aims to increase the extraction of the Gray Level Co-Occurrence Matrix feature to obtain better accuracy results for identification of batik motifs, while to determine the closeness value between training data and batik image testing data will use K- Nearest Neighbor based on the feature extraction value obtained. In this experiment, 4 experiments were carried out based on angles of 0 °, 45 °, 90 °, and 135 ° at values of k = 1, 3, 5, 7, and 9. Meanwhile, to calculate the level of accuracy of the KNN classification, confusion matrix will be used. . From the trials carried out using the total training data of 602 images and testing data of 344 images for all classes based on angles of 0 °, 45 °, 90 °, and 135 ° at values of k = 1, 3, 5, and 9 accuracy The highest obtained by DCT-GLCM was at an angle of 135 ° with a value of k = 3 of 84.88% and the lowest was at an angle of 0 ° with values of k = 7 and 9 of 41.86%. While the test results using only GLCM, the highest accuracy is at an angle of 135 ° with a value of k = 1 of 77.90% and the lowest is at an angle of 90 ° with a value of k = 7 of 40.69%. From the results of the trials conducted, it shows that the DCT works well to improve the results of the GLCM feature extraction as evidenced by the average accuracy results obtained.

Download Full-text

PREDIKSI KELULUSAN MAHASISWA MAGISTER TEKNIK INFORMATIKA UNIVERSITAS AMIKOM YOGYAKARTA MENGGUNAKAN METODE K-NEAREST NEIGHBOR

Respati ◽

10.35842/jtir.v13i2.260 ◽

2018 ◽

Vol 13 (2) ◽

Author(s):

Eri Sasmita Susanto ◽

Kusrini Kusrini ◽

Hanif Al Fatta

Keyword(s):

Nearest Neighbor ◽

Nearest Neighbors ◽

Training Data ◽

K Nearest Neighbor ◽

Process Data ◽

K Nearest Neighbors ◽

Testing Data ◽

Estimation Scheme ◽

Student Graduation ◽

Feasibility Test

INTISARIPenelitian ini difokuskan untuk mengetahui uji kelayakan prediksi kelulusan mahasiswa Universitas AMIKOM Yogyakarta. Dalam hal ini penulis memilih algoritma K-Nearest Neighbors (K-NN) karena K-Nearest Neighbors (K-NN) merupakan algoritma yang bisa digunakan untuk mengolah data yang bersifat numerik dan tidak membutuhkan skema estimasi parameter perulangan yang rumit, ini berarti bisa diaplikasikan untuk dataset berukuran besar.Input dari sistem ini adalah Data sampel berupa data mahasiswa tahun 2014-2015. pengujian pada penelitian ini menggunakn dua pengujian yaitu data testing dan data training. Kriteria yang digunakan dalam penelitian ini adalah , IP Semester 1-4, capaian SKS, Status Kelulusan. Output dari sistem ini berupa hasil prediksi kelulusan mahasiswa yang terbagi menjadi dua yaitu tepat waktu dan kelulusan tidak tepat waktu.Hasil pengujian menunjukkan bahwa Berdasarkan penerapan k=14 dan k-fold=5 menghasilkan performa yang terbaik dalam memprediksi kelulusan mahasiswa dengan metode K-Nearest Neighbor menggunakan indeks prestasi 4 semester dengan nilai akurasi= 98,46%, precision= 99.53% dan recall =97.64%.Kata kunci: Algoritma K-Nearest Neighbors, Prediksi Kelulusan, Data Testing, Data Training ABSTRACTThis research is focused on knowing the feasibility test of students' graduation prediction of AMIKOM University Yogyakarta. In this case the authors chose the K-Nearest Neighbors (K-NN) algorithm because K-Nearest Neighbors (K-NN) is an algorithm that can be used to process data that is numerical and does not require complicated repetitive parameter estimation scheme, this means it can be applied for large datasets.The input of this system is the sample data in the form of student data from 2014-2015. test in this research use two test that is data testing and training data. The criteria used in this study are, IP Semester 1-4, achievement of SKS, Graduation Status. The output of this system in the form of predicted results of student graduation which is divided into two that is timely and graduation is not timely.The result of the test shows that based on the application of k = 14 and k-fold = 5, the best performance in predicting the students' graduation using K-Nearest Neighbor method uses 4 semester achievement index with accuracy value = 98,46%, precision = 99.53% and recall = 97.64%.Keywords: K-Nearest Neighbors Algorithm, Graduation Prediction, Testing Data, Training Data

Download Full-text

Perbandingan Akurasi, Recall, dan Presisi Klasifikasi pada Algoritma C4.5, Random Forest, SVM dan Naive Bayes

JURNAL MEDIA INFORMATIKA BUDIDARMA ◽

10.30865/mib.v5i2.2937 ◽

2021 ◽

Vol 5 (2) ◽

pp. 640

Author(s):

Mulkan Azhari ◽

Zakaria Situmorang ◽

Rika Rosnelly

Keyword(s):

Random Forest ◽

Naive Bayes ◽

Naïve Bayes ◽

Training Data ◽

Random Forest Algorithm ◽

Svm Algorithm ◽

Testing Data ◽

C4.5 Algorithm ◽

Bayes Algorithm ◽

Using Data

In this study aims to compare the performance of several classification algorithms namely C4.5, Random Forest, SVM, and naive bayes. Research data in the form of JISC participant data amounting to 200 data. Training data amounted to 140 (70%) and testing data amounted to 60 (30%). Classification simulation using data mining tools in the form of rapidminer. The results showed that . In the C4.5 algorithm obtained accuracy of 86.67%. Random Forest algorithm obtained accuracy of 83.33%. In SVM algorithm obtained accuracy of 95%. Naive Bayes' algorithm obtained an accuracy of 86.67%. The highest algorithm accuracy is in SVM algorithm and the smallest is in random forest algorithm

Download Full-text

IMPLEMENTASI K-NEAREST NEIGHBORD PADA RAPIDMINER UNTUK PREDIKSI KELULUSAN MAHASISWA

High Education of Organization Archive Quality: Jurnal Teknologi Informasi ◽

10.52972/hoaq.vol10no1.p35-41 ◽

2018 ◽

Vol 10 (1) ◽

pp. 35-41

Author(s):

Sumarlin Sumarlin ◽

Dewi Anggraini

Keyword(s):

Cross Validation ◽

Nearest Neighbor ◽

Confusion Matrix ◽

Training Data ◽

K Nearest Neighbor ◽

Process Data ◽

Nearest Neighbor Algorithm ◽

Student Graduation ◽

K Nearest Neighbor Algorithm ◽

Auc Value

Data on graduate students is an important part in determining the quality of a private and public university. Graduate data is included in important assessments in the accreditation process. Data from Uyelindo Kupang STIKOM graduates every year will continue to grow and accumulate like neglected data because it is rarely used. To maximize student data into information that can be used by universities, the data must be processed in this case used as training data in a study using data mining to obtain information in the form of predictions of graduation from Kupang Uyelindo STIKOM students. The method used in this study is K-Nearest Neighbor using rapidminer software to measure K-Nearest Neighbor's accuracy against student graduate data. The criteria used were in the form of student names, gender, cumulative achievement index (GPA) from semester 1 to 6. In applying the K-Nearest Neighbor algorithm can be used to produce predictions of student graduation. To measure the performance of the k-nearest neighbor algorithm, the Cross Validation, Confusion Matrix and ROC Curves methods are used, in this study using a 5-fold cross validation to predict student graduation. From 100 student dataset records Uyelindo Kupang STIKOM graduates obtained accuracy rate reached 82% and included a very good classification because it has an AUC value between 0.90-1.00, which is 0.971, so it can be concluded that the accuracy of testing of student graduation models using K-Nearest Neighbor (K-NN) algorithm is influenced by the number of data clusters. Accuracy and the highest AUC value of 5-fold validation is to cluster data k = 4 with the accuracy value of 90%.

Download Full-text

Comparison Analysis of K-Nearest Neighbor and Naïve Bayes in Determining Talent of Adolescence

International Journal of Artificial Intelligence Research ◽

10.29099/ijair.v4i1.118 ◽

2020 ◽

Vol 4 (1) ◽

Author(s):

Yessi Jusman ◽

Widdya Rahmalina ◽

Juni Zarman

Keyword(s):

Nearest Neighbor ◽

Naive Bayes ◽

Confusion Matrix ◽

Naïve Bayes ◽

Training Data ◽

K Nearest Neighbor ◽

Combined Training ◽

Testing Data ◽

Bayes Algorithm ◽

Children's Interests

Adolescence always searches for the identity to shape the personality character. This paper aims to use the artificial intelligent analysis to determine the talent of the adolescence. This study uses a sample of children aged 10-18 years with testing data consisting of 100 respondents. The algorithm used for analysis is the K-Nearest Neigbor and Naive Bayes algorithm. The analysis results are performance of accuracy results of both algorithms of classification. In knowing the accurate algorithm in determining children's interests and talents, it can be seen from the accuracy of the data with the confusion matrix using the RapidMiner software for training data, testing data, and combined training and testing data. This study concludes that the K-Nearest Neighbor algorithm is better than Naive Bayes in terms of classification accuracy.

Download Full-text

IMPLEMENTASI ALGORITMA BACKPROPAGATION UNTUK MEMPREDIKSI KELULUSAN MAHASISWA

KLIK - KUMPULAN JURNAL ILMU KOMPUTER ◽

10.20527/klik.v5i2.152 ◽

2018 ◽

Vol 5 (2) ◽

pp. 169

Author(s):

Muhammad Dedek Yalidhan

Keyword(s):

Neural Network ◽

Artificial Neural Network ◽

Information System ◽

Prediction Accuracy ◽

Confusion Matrix ◽

The Other ◽

Training Data ◽

Backpropagation Algorithm ◽

Testing Data ◽

Artificial Neural

Student’s graduation is one kind of the college accreditation elements by BAN-PT. Because of that. Information System is one of the department in STMIK Banjarbaru, there is no application has been implemented to predict imprecisely of student’s graduation time so far, which causes on time graduation percentage tend low every year. Therefore the accurate student’s graduation prediction can help the committe to choose the correct decisions in order to prevent the imprecisely of student’s graduation time. In this research, the backpropagation algorithm of artificial neural network will be implemented into the application with the output result as delayed and on time graduation. This reseach is using 318 data samples which the 70 % of it will be used as the training data and the other 30 % will be used as testing data. From the calculation of confusion matrix table’s the percentage of the prediction accuracy is 98.97 %.Keywords: student’s graduation, artificial neural network, backpropagation, confusion matrixKelulusan mahasiswa merupakan salah satu elemen dalam standar akreditasi perguruan tinggi oleh BAN-PT. Sistem Informasi adalah salah satu program studi yang ada di STMIK Banjarbaru, selama ini belum ada aplikasi yang diimplementasikan untuk memprediksi ketidaktepatan waktu kelulusan mahasiswanya yang menyebabkan angka kelulusan tepat waktu cenderung rendah setiap tahunnya. Oleh sebab itu, prediksi kelulusan mahasiswa yang akurat dapat membantu pihak Program Studi dalam mengambil keputusan-keputusan yang tepat untuk mencegah ketidaktepatan waktu kelulusan mahasiswanya. Pada penelitian ini, artificial neural network algoritma backpropagation diimplementasikan pada aplikasi yang dibuat dengan output lulus terlambat dan lulus tepat waktu. Penelitian ini menggunakan sebanyak 318 sampel data yang mana 70 % data digunakan sebagai data training dan 30 % data digunakan sebagai data testing. Dari hasil perhitungan tabel confusion matrix diperoleh persentase akurasi prediksi sebesar 98.97 %.Kata kunci: kelulusan mahasiswa, artificial neural network, backpropagation, confusion matrix

Download Full-text

Characterization of Depositional Facies Using Artificial Intelligence Method Based on Electrical Log Data

Proc. Indon. Petrol. Assoc., Digital Technical Conference, 2020 ◽

10.29118/ipa20-sg-133 ◽

2020 ◽

Author(s):

M. Mahendra

Keyword(s):

Gamma Ray ◽

Confusion Matrix ◽

Training Data ◽

Gradient Boosting ◽

Primary Method ◽

Learning Models ◽

Depositional Facies ◽

Testing Data ◽

Neutron Porosity

This study will focus on identifying depositional facies in uncored intervals using a gradient boosting classifier, based on electric logs: gamma-ray (GR), resistivity (ILD), neutron porosity (NPHI), and density (RHOB), as well as facies description and classification derived from cored intervals. Supervised learning with gradient boosting classifiers is the primary method that combines a lot of weak learning models to create a robust predictive model. A gradient boosting classifier was applied because the output will be in the form of images. We used nine wells such as four training data, and five testing data along with gamma-ray, resistivity, NPHI, and RHOB as input. The statistical methods were used to distribute facies on each well, and we used the F1 score and average of confusion matrix to validate the values. The result shows 0.718 or 71.8% of the F1 score and 0.6617 or 66.17% of the confusion matrix. With this level of accuracy, we conclude that the gradient boosting classifier methods are reliable enough to determine facies in the area that have limited core data with satisfying efficient results without reducing the accuracy.

Download Full-text