Application Development of Student's Graduation Classification Model based on The First 2 Years Performance using K-Nearest Neighbor

Mapping Intimacies ◽

10.31227/osf.io/ftwre ◽

2018 ◽

Author(s):

Purwono Prasetyawan ◽

Muhammad Faridz Abadi

Keyword(s):

Cross Validation ◽

Nearest Neighbor ◽

Educational Institution ◽

Training Data ◽

Classification Model ◽

K Nearest Neighbor ◽

Application Development ◽

K Value ◽

The Status ◽

Fold Cross Validation

A College keeps a lot of data such as, academic data, administration, student biodata and others. The existing student data has not been fully utilized. In the student education system is an important asset for an educational institution and for that it is necessary to note the graduation rate of students on time. Differences in the ability of students to complete the study on time required the monitoring and evaluation, so that it can find new information or knowledge to make decisions. The purpose of this study, to know the relationship between IP variables Semester 1, IP Semester 2, IP Semester 3, IP Semester 4, Gender, Student Status on Student Study Duration using k-nearest neighbor algorithm. The result of this research in the classification of students' graduation using the knn algorithm based on student status, gender, ip semester 1 - ip semester 4 with k-fold cross validation in can mean value of K1 accuracy 88%, K3 accuracy 88.67%, K5 accuracy of 93.78%, K7 86% accuracy, K9 accuracy 86.22%, K11 accuracy 92.44%, K13 accuracy 89.55%, K15 accuracy 93.78%, K17 accuracy 99.78%, and K19 accuracy 100 %. Of the 500 training data in the status of 188 students, 312 students, the status of students work longer in completing the lecture and in the gender of 290 men, 210 women, then women longer in finishing college. Finding the optimal k value using k-fold cross validation. The result of accuracy using k-fold cross validation is K19 with 100% accuracy.

Download Full-text

Optimization of k value and lag parameter of k-nearest neighbor algorithm on the prediction of hotel occupancy rates

Jurnal Teknologi dan Sistem Komputer ◽

10.14710/jtsiskom.2020.13648 ◽

2020 ◽

Vol 8 (3) ◽

pp. 246-254

Author(s):

Agus Subhan Akbar ◽

R. Hadapiningradja Kusumodestoni

Keyword(s):

Nearest Neighbor ◽

Business Management ◽

Training Data ◽

K Nearest Neighbor ◽

Nearest Neighbor Algorithm ◽

K Value ◽

Sample Data ◽

K Nearest Neighbor Algorithm ◽

Occupancy Rates ◽

Fold Cross Validation

Hotel occupancy rates are the most important factor in hotel business management. Prediction of the rates for the next few months determines the manager's decision to arrange and provide all the needed facilities. This study performs the optimization of lag parameters and k values of the k-Nearest Neighbor algorithm on hotel occupancy history data. Historical data were arranged in the form of supervised training data, with the number of columns per row according to the lag parameter and the number of prediction targets. The kNN algorithm was applied using 10-fold cross-validation and k-value variations from 1-30. The optimal lag was obtained at intervals of 14-17 and the optimal k at intervals of 5-13 to predict occupancy rates of 1, 3, 6, 9, and 12 months later. The obtained k-value does not follow the rule at the square root of the number of sample data.

Download Full-text

Aplikasi Prediksi Kelulusan Mahasiswa Berbasis K-Nearest Neighbor (K-NN)

JTIM : Jurnal Teknologi Informasi dan Multimedia ◽

10.35746/jtim.v1i1.11 ◽

2019 ◽

Vol 1 (1) ◽

pp. 30-36 ◽

Cited By ~ 1

Author(s):

Lalu Abd Rahman Hakim ◽

Ahmad Ashril Rizal ◽

Dwi Ratnasari

Keyword(s):

Nearest Neighbor ◽

Educational Institution ◽

Confusion Matrix ◽

K Nearest Neighbor ◽

Study Program ◽

K Value ◽

Student Graduation ◽

K Nearest Neighbor Algorithm ◽

Communication Planning ◽

Fold Cross Validation

Students are important assets for an educational institution and for this reason, it is necessary to pay attention to the student's graduation rate on time. Presentation of the ups and downs of students' ability to complete their studies on time is one of the elements of campus accreditation assessment. Based on data from the Study Program Section in the last 3 years the student graduation presentation is only 25% of the total students who can complete their studies on time. In this study using the K-Nearest Neighbor algorithm which aims to be able to identify student graduation in new cases by adapting solutions from previous cases that have closeness to new cases. This algorithm has the role to get the value of the closeness of the new case to the old case, which in turn the most population in area K with the closest value obtained by the student is predicted whether to pass on time or not on time. This study uses Roger S. Pressman's waterfalll method, namely Communication, Planning, Modeling, and Construction. Based on the tests carried out using K-Fold Cross Validation, the highest accuracy in the third model was 80% when folded 4th and 61% when the K value = 1. While testing using the Confusion Matrix obtained the highest accuracy of 98% at K = 1 for classification "Timely", and 98% at K = 2 for classification "Not Timely"

Download Full-text

KLASIFIKASI STATUS PEMBAYARAN PREMI MENGGUNAKAN ALGORITMA NEIGHBOR WEIGHTED K-NEAREST NEIGHBOR (NWKNN) (STUDI KASUS: PT. BUMIPUTERA KOTA SAMARINDA)

VARIANCE : Journal of Statistics and Its Applications ◽

10.30598/variancevol1iss2page56-63 ◽

2020 ◽

Vol 1 (2) ◽

pp. 56-63

Author(s):

Grassella Gunsyang ◽

Ika Purnamasari ◽

Fidia Deny Tisna Amijaya

Keyword(s):

Cross Validation ◽

Nearest Neighbor ◽

K Nearest Neighbor ◽

Fold Cross Validation

Algoritma Neighbor Weighted K-Nearest Neighbor (NWKNN) merupakan pengembangan dari algoritma K-Nearest Neighbor (KNN), dengan memberikan bobot pada setiap kelas yang akan diklasifikasikan. Penelitian ini membahas tentang klasifikasi menggunakan algoritma NWKNN yang diaplikasikan pada data status pembayaran premi. Tujuannya untuk mengetahui nilai eksponen (E) dan nilai ketetanggaan (K) yang optimal, serta nilai akurasi dari klasifikasi data status pembayaran Premi di PT. Bumiputera Kota Samarinda. Tahapan dalam penelitian ini yaitu menentukan nilai E dan nilai K menggunakan k-fold cross validation, menghitung jarak euclidean, menghitung bobot dan skor setiap kelas, melihat nilai skor terbesar untuk menentukan hasil klasifikasi, kemudian menghitung nilai akurasi klasifikasi. Hasil penelitian menunjukkan bahwa nilai K dan nilai E yang optimal untuk klasifikasi status pembayaran premi di PT. Bumiputera Kota Samarinda menggunakan NWKNN sebesar K=3 dan E=6 dengan nilai akurasi sebesar 75%.

Download Full-text

Tone Classification Matches Kodàly Handsign with the K-Nearest Neighbor Method at Leap Motion Controller

International Journal on Information and Communication Technology (IJoICT) ◽

10.21108/ijoict.2019.52.283 ◽

2020 ◽

Vol 5 (2) ◽

pp. 40

Author(s):

Muhammad Croassacipto ◽

Muhammad Ichwan ◽

Dina Budhi Utami

Keyword(s):

Music Education ◽

Nearest Neighbor ◽

Human Interaction ◽

Training Data ◽

Leap Motion ◽

K Nearest Neighbor ◽

Motion Controller ◽

K Value ◽

Leap Motion Controller ◽

Natural Function

<p>Hands can produce a variety of poses in which each pose can have a meaning or purpose that can be used as a form of communication determined according to a general agreement or who communicate. Hand pose can be used as human interaction with the computer is faster, intuitive, and in line with the natural function of the human body called Handsign. One of them is Kodàly Handsign, made by a Hungarian composer named Zoltán Kodály, which is a concept in music education in Hungary. This hand sign is used in interactive angklung performances in determining the tone that will be played by the K-Nearest Neighbor (KNN) algorithm classification process based on hand poses. This classification process is performed on the extracted data from Leap Motion Controller, which takes Pitch, Roll, and Yaw values based on basic aircraft principle. The results of the research were conducted five times with the value of k periodically 1,3,5,7,9 with test data consisting pose of 874 Do', 702 Si, 913 La, 612 Sol, 661 Fa, 526 Mi, 891 Re, and 1004 Do punctuation on 21099 training data. The test results can recognize hand poses with the optimal k value k=1 with an accuracy level of 94.87%.</p>

Download Full-text

Phishing Website Detection Using Machine Learning Classifiers Optimized by Feature Selection

Traitement du signal ◽

10.18280/ts.370403 ◽

2020 ◽

Vol 37 (4) ◽

pp. 563-569

Author(s):

Dželila Mehanović ◽

Jasmin Kevrić

Keyword(s):

Feature Selection ◽

Random Forest ◽

Cross Validation ◽

Nearest Neighbor ◽

Security Threats ◽

Selection Methods ◽

K Nearest Neighbor ◽

Machine Learning Classifiers ◽

Time To Build ◽

Fold Cross Validation

Security is one of the most actual topics in the online world. Lists of security threats are constantly updated. One of those threats are phishing websites. In this work, we address the problem of phishing websites classification. Three classifiers were used: K-Nearest Neighbor, Decision Tree and Random Forest with the feature selection methods from Weka. Achieved accuracy was 100% and number of features was decreased to seven. Moreover, when we decreased the number of features, we decreased time to build models too. Time for Random Forest was decreased from the initial 2.88s and 3.05s for percentage split and 10-fold cross validation to 0.02s and 0.16s respectively.

Download Full-text

Perbandingan Akurasi dan Waktu Proses Algoritma K-NN dan SVM dalam Analisis Sentimen Twitter

Jurnal Informatika ◽

10.31311/ji.v6i2.5129 ◽

2019 ◽

Vol 6 (2) ◽

pp. 226-235

Author(s):

Muhammad Rangga Aziz Nasution ◽

Mardhiya Hayaty

Keyword(s):

Machine Learning ◽

Support Vector Machine ◽

Unsupervised Learning ◽

Supervised Learning ◽

Cross Validation ◽

Nearest Neighbor ◽

Support Vector ◽

K Nearest Neighbor ◽

Fold Cross Validation

Salah satu cabang ilmu komputer yaitu pembelajaran mesin (machine learning) menjadi tren dalam beberapa waktu terakhir. Pembelajaran mesin bekerja dengan memanfaatkan data dan algoritma untuk membuat model dengan pola dari kumpulan data tersebut. Selain itu, pembelajaran mesin juga mempelajari bagaimama model yang telah dibuat dapat memprediksi keluaran (output) berdasarkan pola yang ada. Terdapat dua jenis metode pembelajaran mesin yang dapat digunakan untuk analisis sentimen: supervised learning dan unsupervised learning. Penelitian ini akan membandingkan dua algoritma klasifikasi yang termasuk dari supervised learning: algoritma K-Nearest Neighbor dan Support Vector Machine, dengan cara membuat model dari masing-masing algoritma dengan objek teks sentimen. Perbandingan dilakukan untuk mengetahui algoritma mana lebih baik dalam segi akurasi dan waktu proses. Hasil pada perhitungan akurasi menunjukkan bahwa metode Support Vector Machine lebih unggul dengan nilai 89,70% tanpa K-Fold Cross Validation dan 88,76% dengan K-Fold Cross Validation. Sedangkan pada perhitungan waktu proses metode K-Nearest Neighbor lebih unggul dengan waktu proses 0.0160s tanpa K-Fold Cross Validation dan 0.1505s dengan K-Fold Cross Validation.

Download Full-text

Analisis Komparatif Evaluasi Performa Algoritma Klasifikasi pada Readmisi Pasien Diabetes

Jurnal Buana Informatika ◽

10.24002/jbi.v7i4.770 ◽

2016 ◽

Vol 7 (4) ◽

Author(s):

Mochammad Yusa ◽

Ema Utami ◽

Emha T. Luthfi

Keyword(s):

Data Mining ◽

Decision Tree ◽

Cross Validation ◽

Nearest Neighbor ◽

Naive Bayes ◽

Kappa Statistic ◽

Naïve Bayes ◽

Validation Dataset ◽

K Nearest Neighbor ◽

Fold Cross Validation

Abstract. Readmission is associated with quality measures on patients in hospitals. Different attributes related to diabetic patients such as medication, ethnicity, race, lifestyle, age, and others result in the calculation of quality care that tends to be complicated. Classification techniques of data mining can solve this problem. In this paper, the evaluation on three different classifiers, i.e. Decision Tree, k-Nearest Neighbor (k-NN), dan Naive Bayes with various settingparameter, is developed by using 10-Fold Cross Validation technique. The targets of parameter performance evaluated is based on term of Accuracy, Mean Absolute Error (MAE), dan Kappa Statistic. The selected dataset consists of 47 attributes and 49.735 records. The result shows that k-NN classifier with k=100 has a better performance in term of accuracy and Kappa Statistic, but Naive Bayes outperforms in term of MAE among other classifiers. Keywords: k-NN, naive bayes, diabetes, readmissionAbstrak. Proses Readmisi dikaitkan dengan perhitungan kualitas penanganan pasien di rumah sakit. Perbedaan atribut-atribut yang berhubungan dengan pasien diabetes proses medikasi, etnis, ras, gaya hidup, umur, dan lain-lain, mengakibatkan perhitungan kualitas cenderung rumit. Teknik klasifikasi data mining dapat menjadi solusi dalam perhitungan kualitas ini. Teknik klasifikasi merupakan salah satu teknik data mining yang perkembangannya cukup signifikan. Di dalam penelitian ini, model algoritma klasifikasi Decision Tree, k-Nearest Neighbor (k-NN), dan Naive Bayes dengan berbagai parameter setting akan dievaluasi performanya berdasarkan nilai performa Accuracy, Mean AbsoluteError (MAE), dan Kappa Statistik dengan metode 10-Fold Cross Validation. Dataset yang dievaluasi memiliki 47 atribut dengan 49.735 records. Hasil penelitian menunjukan bahwa performa accuracy, MAE, dan Kappa Statistik terbaik didapatkan dari Model Algoritma Naive Bayes.Kata Kunci: k-NN, naive bayes, diabetes, readmisi

Download Full-text

CASE BASE REASONING UNTUK MENENTUKAN KEBUTUHAN BAHAN BANGUNAN RUMAH

SINTECH (Science and Information Technology) Journal ◽

10.31598/sintechjournal.v2i1.224 ◽

2018 ◽

Vol 1 (2) ◽

pp. 70-75

Author(s):

Abdul Rozaq

Keyword(s):

Test Data ◽

Building Materials ◽

Cross Validation ◽

Nearest Neighbor ◽

Training Data ◽

Consultation Process ◽

Case Base ◽

Case Base Reasoning ◽

House Building ◽

Fold Cross Validation

Building materials is an important factor to built a house, to estimate funds the needs of build a house, consumers or developers can estimate the funds needed to build a house. To solve these problems use case base reasoning (CBR) approach, which method is capable of reasoning or solving the problem based on the cases that have been there as a solution to new problems. The system built in this study is a CBR system for determine the needs of house building materials. The consultation process is done by inserting new cases compared to the old case similarity value is then calculated using the nearest neighbor. The first test by inserting test data then compared with each type of home then obtained an accuracy of 83.6%. The second test is done by K-fold Cross Validation with K = 25 with the number of data 200, the data will be divided into two parts, namely the training data and test data, training data as many as 192 data and test data as many as 8 data. K-Fold Cross Validation method. This CBR system can produce an accuracy of 85.71%

Download Full-text

KLASIFIKASI DOKUMEN TUGAS AKHIR (SKRIPSI) MENGGUNAKAN K-NEAREST NEIGHBOR

JISKA (Jurnal Informatika Sunan Kalijaga) ◽

10.14421/jiska.2019.41-07 ◽

2019 ◽

Vol 4 (1) ◽

pp. 69

Author(s):

Kitami Akromunnisa ◽

Rahmat Hidayat

Keyword(s):

Test Data ◽

Cross Validation ◽

Nearest Neighbor ◽

Data Distribution ◽

Training Data ◽

K Nearest Neighbor ◽

Electronic Documents ◽

Digital Version ◽

Abstract Data

Various scientific works from academicians such as theses, research reports, practical work reports and so forth are available in the digital version. However, in general this phenomenon is not accompanied by a growth in the amount of information or knowledge that can be extracted from these electronic documents. This study aims to classify the abstract data of informatics engineering thesis. The algorithm used in this study is K-Nearest Neighbor. Amount of data used 50 abstract data of Indonesian language, 454 data of English abstract and 504 title data. Each data is divided into training data and test data. Test data will be classified automatically with the classifier model that has been made. Based on the research conducted, the classification of the Indonesian essential data resulted in greater accuracy without going through a stemming process that had a 9: 1 ratio of 100.0% compared to an 8: 2 ratio of 90.0%, 7: 3 which was 80.0%, 6: 4 which is 60.0% and the data distribution using Kfold cross validation is 80.0%.

Download Full-text

Bayes Classifier dan Support Vector Machine dalam Klasifikasi Judul Karya Akhir Mahasiswa Program Studi PTIK UNJ

PINTER Jurnal Pendidikan Teknik Informatika dan Komputer ◽

10.21009/pinter.3.1.9 ◽

2019 ◽

Vol 3 (1) ◽

pp. 54-62

Author(s):

Razi Aziz Syahputro ◽

Widodo ◽

Hamidillah Ajie

Keyword(s):

Support Vector Machine ◽

Cross Validation ◽

Nearest Neighbor ◽

Confusion Matrix ◽

Vector Space Model ◽

Support Vector ◽

Bayes Classifier ◽

K Nearest Neighbor ◽

Space Model ◽

Fold Cross Validation

Penelitian ini dilatarbelakangi dengan dibutuhkannya sistem pengklasifikasian untuk memudahkan pihak Jurusan Teknik Elektro khususnya Program Studi PTIK untuk mengklasifikasikan judul skripsi berdasarkan peminatan. Sebelum sistem dibuat diperlukan pertimbangan dari beberapa algoritma klasifikasi yang ada, maka dari itu penelitian ini memilih 3 algoritma dari 10 algoritma terbaik menurut ICDM tahun 2006. Klasifikasi terhadap dokumen teks pendek seperti judul skripsi mahasiswa memiliki kesulitan tersendiri daripada dokumen teks panjang karena semakin sedikit kata semakin sulit diklasifikasi. Sehingga tujuan dari penelitian ini adalah untuk mengetahui algoritma yang paling efektif untuk mengklasifikasi judul skripsi. Penelitian ini terdiri dari beberapa tahap yaitu pengumpulan data, pengelompokan data melalui angket oleh dosen ahli, pre-processing text, pembobotan kata menggunakan vector space model dan tf-idf, evaluasi dengan k-fold cross validation, klasifikasi menggunakan k-nearest neighbor, naïve bayes classifier, dan support vector machine, dan analisis dengan confusion matrix. Percobaan dilakukan dengan menggunakan 266 data judul skripsi mahasiswa PTIK UNJ dari angkatan 2010-2013, dengan data terakhir berasal dari sidang skripsi pada semester 105(semester ganjil 2016/2017). Hasil dari klasifikasi menggunakan algoritma tersebut didapatkan algoritma yang paling efisien yaitu support vector machine dengan akurasi 82% dari 10 kali percobaan.

Download Full-text