Implementasi K-Nearest Neighbor Dalam Prediksi Mahasiswa Berhenti Kuliah

A student is a student who sits and is registered in one of the universities, both public and private, being a student is the dream of many students around the world and being a student is the starting gate to determine someone will be in the world of science in what field, be it computer science, medicine, world of education and others. However, there are many reasons why students decide to stop attending lectures suddenly due to several factors, both external and internal factors. This causes its own losses that will be faced by the campus, one of which is the reduction in the quantity of student data and resulting in data accumulation, it is necessary to predict students who have the potential to stop studying unilaterally by looking at several criteria and digging up information on the data of students who have the potential to quit college by applying the K-algorithm. NN. In this study, the K-NN algorithm records old data and sees similarities to new data in an effort to recognize patterns of students dropping out of college, the results obtained from new lecture data show that the data is similar to the old data of students who dropped out of college with the closest similarity of values from other cases, namely 17 .3815 with 19.98875 so that the results obtained by the new data student decision decided the possibility of dropping out of college

Download Full-text

Classification of Fish Species with Image Data Using K-Nearest Neighbor

International Journal of Computer and Information System (IJCIS) ◽

10.29040/ijcis.v2i2.33 ◽

2021 ◽

Vol 2 (2) ◽

pp. 54-58

Author(s):

Kaharuddin Kaharuddin ◽

Eka Wahyu Sholeha

Keyword(s):

Computer Science ◽

Everyday Life ◽

Fish Species ◽

Nearest Neighbor ◽

Image Data ◽

Test Results ◽

Shape Features ◽

K Nearest Neighbor ◽

The World

Abstract— Classification is a technique that many of us encounter in everyday life, classification science is also growing and being applied to various types of data and cases in everyday life, in computer science classification has been developed to facilitate human work, one example of its application is to classify fish species in the world, the number of fish species in the world is very much so that there are still many people who are sometimes confused to distinguish them, therefore in this study a study will be conducted to classify fish species using the K-Nearest Neighbor Method. 4 types of fish, all data totaling 160 data. The purpose of this study was to test the K-Nearest Neighbor method for classifying fish species based on color, texture, and shape features. Based on the test results, the accuracy value of the truth is obtained using the value of K = 7 with a percentage of the truth of 77.50%, the second-highest accuracy value is the value of K = 10, namely 76.88%. Based on the results of this study, it can be concluded that the K-Nearest Neighbor method has a good enough ability to classify, but it can be done by adding variables or adding more amount of data, and using other types of fish.

Download Full-text

Computational Intelligence-Based Model for Mortality Rate Prediction in COVID-19 Patients

International Journal of Environmental Research and Public Health ◽

10.3390/ijerph18126429 ◽

2021 ◽

Vol 18 (12) ◽

pp. 6429

Author(s):

Irfan Ullah Khan ◽

Nida Aslam ◽

Malak Aljabri ◽

Sumayh S. Aljameel ◽

Mariam Moataz Aly Kamaleldin ◽

...

Keyword(s):

Mortality Rate ◽

Computational Intelligence ◽

Nearest Neighbor ◽

Gradient Boosting ◽

K Nearest Neighbor ◽

Detection And Identification ◽

Proposed Model ◽

Extreme Gradient Boosting ◽

The World ◽

Detection And Diagnosis

The COVID-19 outbreak is currently one of the biggest challenges facing countries around the world. Millions of people have lost their lives due to COVID-19. Therefore, the accurate early detection and identification of severe COVID-19 cases can reduce the mortality rate and the likelihood of further complications. Machine Learning (ML) and Deep Learning (DL) models have been shown to be effective in the detection and diagnosis of several diseases, including COVID-19. This study used ML algorithms, such as Decision Tree (DT), Logistic Regression (LR), Random Forest (RF), Extreme Gradient Boosting (XGBoost), and K-Nearest Neighbor (KNN) and DL model (containing six layers with ReLU and output layer with sigmoid activation), to predict the mortality rate in COVID-19 cases. Models were trained using confirmed COVID-19 patients from 146 countries. Comparative analysis was performed among ML and DL models using a reduced feature set. The best results were achieved using the proposed DL model, with an accuracy of 0.97. Experimental results reveal the significance of the proposed model over the baseline study in the literature with the reduced feature set.

Download Full-text

Building a predictive model to assist in the diagnosis of cervical cancer

Future Oncology ◽

10.2217/fon-2021-0767 ◽

2021 ◽

Author(s):

Emmanuel Kwateng Drokow ◽

Adu Asare Baffour ◽

Clement Yaw Effah ◽

Clement Agboyibor ◽

Gloria Selorm Akpabla ◽

...

Keyword(s):

Cervical Cancer ◽

Nearest Neighbor ◽

Learning Models ◽

K Nearest Neighbor ◽

Effective Technique ◽

Ensemble Technique ◽

Cervical Cancer Risk ◽

Average Accuracy ◽

The World ◽

Machine Learning Models

Aim: Cervical cancer is still one of the most common gynecologic cancers in the world. Since cervical cancer is a potentially preventive cancer, earlier detection is the most effective technique for decreasing the worldwide incidence of the illness. Materials and methods: This research presents a novel ensemble technique for predicting cervical cancer risk. Specifically, the authors introduce a voting classifier that aggregates prediction probabilities from multiple machine-learning models: logistic regression, K-nearest neighbor, decision tree, XGBoost and multilayer perceptron. Results: The average accuracy, precision, recall and f1-score of the voting classifier were 96.6, 97.4, 95.9 and 96.6, respectively. Furthermore, the voting algorithm gains average high values for all evaluation metrics (accuracy, precision, recall and f1-score). The f1-score of the algorithm is 96%, which demonstrates the robustness of the model. Conclusion: The findings suggest that the probability of having cervical cancer can be accurately predicted utilizing the voting technique.

Download Full-text

Performance of Classifiers on Newsgroups using Specific Subset of Terms

International Journal of Innovative Technology and Exploring Engineering - Special Issue ◽

10.35940/ijitee.a4652.119119 ◽

2019 ◽

Vol 9 (1) ◽

pp. 2497-2500

Keyword(s):

Text Classification ◽

Text Categorization ◽

Nearest Neighbor ◽

Vital Role ◽

Support Vector ◽

Classification Algorithms ◽

K Nearest Neighbor ◽

Specific Subset ◽

The World ◽

Good Classification

Text Classification plays a vital role in the world of data mining and same is true for the classification algorithms in text categorization. There are many techniques for text classification but this paper mainly focuses on these approaches Support vector machine (SVM), Naïve Bayes (NB), k-nearest neighbor (k-NN). This paper reveals results of the classifiers on mini-newsgroups data which consists of the classifies on mini-newsgroups data which consists a lot of documents and step by step tasks like a listing of files, preprocessing, the creation of terms(a specific subset of terms), using classifiers on specific subset of datasets. Finally, after the results and experiments over the dataset, it is concluded that SVM achieves good classification output corresponding to accuracy, precision, F-measure and recall but execution time is good for the k-NN approach.

Download Full-text

PENERAPAN DATA MINING TERHADAP DATA COVID-19 MENGGUNAKAN ALGORITMA KLASIFIKASI

Jurnal Informatika ◽

10.30873/ji.v21i1.2868 ◽

2021 ◽

Vol 21 (1) ◽

pp. 44-52

Author(s):

Rizka Dahlia ◽

Nanik Wuryani ◽

Sri Hadianti ◽

Windu Gata ◽

Arina Selawati

Keyword(s):

Data Mining ◽

South Korea ◽

Respiratory System ◽

Nearest Neighbor ◽

Naive Bayes ◽

Naïve Bayes ◽

K Nearest Neighbor ◽

The World ◽

Bayes Algorithm

Coronavirus 2019 or more commonly referred to as COVID-19 is a type of virus that attacks the respiratory system. Until now the number of spread and the number of deaths caused by this virus continues to increase. As of April 21, 2020, based on data from the WHO, the total number of cases infected with this virus reached 2,397,217 with 162 deaths from all over the world. For South Korea itself, as of March 21, 2020, the total number of infected cases was 10,683 with a total of 237 deaths. In this study, researchers conducted data processing on the spread of COVID-19 in South Korea with Rapidminer using a classification algorithm, namely Naïve Bayes, C4.5, and K-Nearest Neighbor by performing the stages of selection, preprocessing, transfotmating, data mining and interpretation or evaluating the quality of the best accuracy of 80.79% with AUC of 0.881 achieved by the Naïve Bayes algorithm. The distribution of the data found that the influential attribute of the isolated class factor from the patient contained in the sex attribute where more women experienced isolation. Keywords— COVID-19, data mining, classification, C4.5, Naïve Bayes, K-NN

Download Full-text

Implementation of K-Nearest Neighbor with Cosine Similarity for Classification Abstract International Journal of Computer Science

2018 International Conference on Information Technology Systems and Innovation (ICITSI) ◽

10.1109/icitsi.2018.8696072 ◽

2018 ◽

Cited By ~ 1

Author(s):

Muhammad Nursalman ◽

Jajang Kusnendar ◽

Ulva Fatma Fadhila

Keyword(s):

Computer Science ◽

Nearest Neighbor ◽

Cosine Similarity ◽

K Nearest Neighbor

Download Full-text

Perbandingan Algoritme Machine Learning untuk Memprediksi Pengambil Matakuliah

Jurnal Teknologi Informasi dan Ilmu Komputer ◽

10.25126/jtiik.2019651755 ◽

2019 ◽

Vol 6 (5) ◽

pp. 543 ◽

Cited By ~ 1

Author(s):

Fitra A. Bachtiar ◽

Indra K. Syahputra ◽

Satrio A. Wicaksono

Keyword(s):

Machine Learning ◽

Support Vector Machine ◽

Nearest Neighbor ◽

Classification Method ◽

Support Vector ◽

Test Results ◽

K Nearest Neighbor ◽

Student Data ◽

Prediction Problems ◽

Imbalance Dataset

Pada setiap awal semester bagian akademik melakukan penjadwalan dan penentuan matakuliah yang akan dibuka untuk semester berikutnya. Akan tetapi proses tersebut memiliki permasalahan antara lain kelas yang dibuka terlalu banyak dibanding jumlah siswa yang berminat atau sebaliknya. Selain itu, dalam permasalahan prediksi data yang terkumpul memiliki kecenderungan tidak seimbang pada setiap kelas (imbalance class). Hal ini akan berdampak pada proses penjadwalan yang kurang tepat. Sehingga dibutuhkan sistem yang dapat memprediksi mahasiswa pengambil mata kuliah. Akan tetapi ada banyak algoritme yang dapat digunakan untuk proses prediksi. Penelitian ini membandingkan performa algoritma untuk klasifikasi mahasiswa pengambil matakuliah. Pada penelitian ini prediksi dilakukan berdasarkan atribut dari data mahasiswa. Atribut-atribut tersebut yaitu Nilai, IP, IPK, SKS, SKSK dan Semester. Pada setiap observasi pada atribut-atribut tersebut prediksi akan dilakukan apakah mahasiswa tersebut mengambil mata kuliah tertentu. Prediksi dibagi menjadi 2 kelas yaitu ‘Ya’ untuk mahasiswa yang diprediksi mengambil matakuliah dan ‘Tidak’ untuk mahasiswa yang diprediksi tidak mengambil matakuliah. Teknik Synthetic Minority Oversampling Technique (SMOTE) digunakan untuk menangani data yang tidak seimbang. Pada penelitian ini klasifikasi dilakukan dengan membandingkan algoritme k-Nearest Neighbor (k-NN) dan Support Vector Machine (SVM) untuk kasus prediksi pengambil matakuliah. Hasil pengujian menggunakan 3 mata kuliah sebagai sampel. Dari hasil rerata, diperoleh hasil prediksi k-NN memiliki kinerja yang lebih baik daripada SVM. Selain itu, penggunaan teknik SMOTE dapat mempengaruhi hasil klasifikasi berupa peningkatan nilai AUC, CA, F1, precision dan recall. AbstractAt the beginning of each semester, the academic section conducts scheduling and determining the courses offered for the next semester. However, the process has problems such as too many classes offered to the student compared to the number of students who take the class or vice versa. Besides that, in the prediction problems, the collected data has an imbalance tendency in each class. As a result, these problems could cause in ineffective scheduling. Thus, there is a need to build a system that can predict students taking courses. However, there are many algorithms that can be used for the prediction. This study compares the performance of algorithms for classifications of students taking courses. In this study, predictions are modeled based on the attributes of student data, namely Grades, GPA, Cumulative GPA, Semester Credits, Cumulative Semester Credits and Semester. The classification process will be carried out to produce a prediction of whether the student takes a particular subject or not. Classification results are divided into 2 classes, namely 'Yes' for students who are predicted to take and 'No' for students who are predicted not to take the class. To handle imbalance dataset will use Synthetic Minority Oversampling Technique (SMOTE) techniques. Classification method used in this study are k-Nearest Neighbor (k-NN) and Support Vector Machine (SVM) algorithms to compare their performance for prediction cases. The test results used 3 courses as a sample. In average k-NN prediction results have a better performance than SVM. In addition, the use of SMOTE techniques can influence the classification results in the form of an increase in AUC, CA, F1, precision and recall values.

Download Full-text

Peningkatan Akurasi Klasifikasi Backpropagation Menggunakan Artificial Bee Colony dan K-NN Pada Penyakit Jantung

JURNAL MEDIA INFORMATIKA BUDIDARMA ◽

10.30865/mib.v5i1.2634 ◽

2021 ◽

Vol 5 (1) ◽

pp. 208

Author(s):

Pandito Dewa Putra ◽

Sukemi Sukemi ◽

Dian Palupi Rini

Keyword(s):

Blood Pressure ◽

Heart Disease ◽

Artificial Bee Colony ◽

Nearest Neighbor ◽

K Nearest Neighbor ◽

Backpropagation Algorithm ◽

Model Accuracy ◽

Bee Colony ◽

The World ◽

K Nearest Neighbor Algorithm

Heart disease has ranked as the leading cause of death in the world, accounting for around 17.3 million deaths per year with some causes, as high blood pressure, diabetes, cholesterol fluctuation, fatigue, and some others which is collected on dataset. Heart disease dataset that was applied is cleveland heart disease with fourteen (14) data atribute. The method that was applied in this research was using Backpropagation algorithm on heart disease classifying, where will be combined Artificial Bee Colony and k-Nearest Neighbor algorithm for features or atribute choose due to this technique can increase classifier model accuracy which is produced as much as 94,23%.

Download Full-text

Klasifikasi Tingkat Laju Data Covid-19 Untuk Mitigasi Penyebaran Menggunakan Metode Modified K-Nearest Neighbor (MKNN)

Jurnal Teknologi Informasi dan Ilmu Komputer ◽

10.25126/jtiik.2021834400 ◽

2021 ◽

Vol 8 (3) ◽

pp. 595

Author(s):

Imam Cholissodin ◽

Felicia Marvela Evanita ◽

Jeffrey Junior Tedjasulaksana ◽

Kukuh Wicaksono Wahyuditomo

Keyword(s):

World Health Organization ◽

Nearest Neighbor ◽

Assessment System ◽

World Health ◽

K Nearest Neighbor ◽

Spread Rate ◽

Social Distancing ◽

The World ◽

Rate Class ◽

Health Organization

COVID-19 atau Coronavirus Disease 2019 merupakan sebuah penyakit yang disebabkan oleh virus yang dapat menular melalui saluran pernapasan pada hewan atau manusia dan menyebabkan ribuan orang meninggal hampir di seluruh dunia, sehingga dinyatakan sebagai sebuah pandemi di banyak negara, termasuk di Indonesia. Kasus COVID-19 pertama kali ditemukan di Indonesia pada tanggal 2 Maret 2020, dalam menangani pandemi COVID-19 pemerintah menerapkan social distancing dengan menjaga jarak antara satu sama lain sejauh lebih dari 1 meter dan menerapkan protokol kesehatan yang telah diatur saat melakukan aktivitas di luar rumah sesuai anjuran World Health Organization (WHO). Rendahnya kesadaran masyarakat Indonesia dalam menerapkan social distancing dan protokol kesehatan menyebabkan bertambahnya kasus positif COVID-19 di Indonesia secara signifikan sehingga banyak korban yang meninggal, oleh karena itu pada penelitian ini kami membuat sistem klasifikasi tingkat laju data COVID-19 untuk mitigasi penyebaran di seluruh provinsi di Indonesia dengan menggunakan metode Modified K-Nearest Neighbor (MKNN) dengan hasil keluaran berupa kelas laju penyebaran yaitu laju penyebaran rendah yang artinya mitigasi penybarannya tinggi, kemudian kelas laju penyebaran sedang yang artinya mitigasi penyebarannya sedang, dan laju penyebaran tinggi yang berarti mitigasi penyebaran rendah dan dijelaskan lebih lanjut pada bagian metodologi penelitian. Hasil keluaran dari sistem bertujuan untuk meningkatkan kesadaran masyarakat Indonesia dalam mencegah COVID-19 dengan melihat kelas laju penyebaran pada masing-masing provinsi di Indonesia. Alasan penggunaan metode Modified K-Nearest Neighbor pada penelitian ini adalah karena metode Modified K-Nearest Neighbor merupakan salah satu metode klasifikasi yang cukup baik, dimana pada metode ini dilakukan pemvalidasian dan pembobotan yang bobot nya ditentukan dengan menghitung fraksi dari tetangga berlabel yang sama dengan total jumlah tetangga. Parameter yang digunakan dalam proses klasifikasi adalah jumlah kasus positif, jumlah orang yang sembuh, dan jumlah orang yang meninggal akibat COVID-19. Data yang digunakan pada penelitian ini berasal dari situs resmi kementerian kesehatan republik Indonesia yang dapat diakses pada link <a href="https://infeksiemerging.kemkes.go.id/">https://infeksiemerging.kemkes.go.id/</a> dengan jumlah data latih sebanyak 374 data pada tanggal 12 Mei 2020 sampai 22 Mei 2020 dan data uji sebanyak 136 data pada tanggal 23 Mei 2020 sampai tanggal 26 Mei 2020 , hasil akurasi yang dihasilkan adalah 97,79% dengan nilai K = 3. AbstractCOVID-19 or Coronavirus 2019 is a disease caused by a virus that can be transmitted through the respiratory tract to animals or humans and causes more people to die around the world, making it a pandemic in many countries, including Indonesia. COVID-19 cases were first discovered in Indonesia on March 2, 2020. Under the COVID-19 pandemic agreement, the government imposed a social grouping with a grouping of more than 1 meter apart from one another and the transfer of related health protection when carrying out activities outside the home as directed by the World Health Organization(WHO). Considering the Indonesian people in implementing social preservation and protecting health policies increase the positive acquisition of COVID-19 in Indonesia significantly related to the number of victims who died, therefore in this study, we created a COVID-19 data level assessment system for transfer mitigation in all provinces in Indonesia by using the Modified K-Nearest Neighbor (MKNN) method with the output in the form of a spread rate class, namely a low spread rate which means that the spread mitigation is high, then the medium spread rate class which means the spread mitigation is moderate, and the spread rate is high which means low spread mitigation which is further explained in the section on the research methodology. The purpose of the system output is to increase the awareness of the Indonesian people in preventing COVID-19. The parameters used in the classification process are the number of positives, the number of people recovered, and the number of people died by COVID-19 by looking at the class distribution rate in each province in Indonesia. The reason for using the Modified K-Nearest Neighbor method in this research is because the Modified K-Nearest Neighbor method is a fairly good classification method, where this method is validated and weighted whose weight is determined by calculating the fraction of neighbors labeled the same as the total of neighbors number. The data used in this study was released from the official website of the Ministry of Health of the Republic of Indonesia which can be accessed at the link https://infection.infemerging.kemkes.go.id/ with a total of 374 training data from May 12, 2020 to May 22, 2020 and test data As many as 136 data from 23 May 2020 to 26 May 2020, the resulting accuracy was 97.79% with a K = 3.

Download Full-text

Predictive Data Mining Models for Novel Coronavirus (COVID-19) Infected Patients Recovery

10.21203/rs.3.rs-33247/v1 ◽

2020 ◽

Author(s):

L. J. Muhammad ◽

Md. Milon Islam ◽

Usman Sani Sharif ◽

Safial Islam Ayon

Keyword(s):

Data Mining ◽

Support Vector Machine ◽

Logistic Regression ◽

Random Forest ◽

Decision Tree ◽

Nearest Neighbor ◽

Support Vector ◽

K Nearest Neighbor ◽

The World ◽

Novel Coronavirus

Abstract Novel coronavirus (COVID-19 or 2019-nCoV) pandemic has neither clinically proven vaccine nor drugs; however, its patients are recovering with the aid of antibiotics medications, anti-viral drugs, and chloroquine as well as vitamin C supplementation. It is now evident that the world needs a speedy and quicker solution to contain and tackle the further spread of COVID-19 across the world with the aid of non-clinical approaches such as data mining approaches, augmented intelligence and other artificial intelligence techniques so as to mitigate the huge burden on the healthcare system while providing the best possible means for patients' diagnosis and prognosis of the 2019-nCoV pandemic effectively. In this study, data mining models were developed for the prediction of COVID-19 infected patients’ recovery using epidemiological dataset of COVID-19 patients of South Korea. The decision tree, support vector machine, naive Bayes, logistic regression, random forest, and K-nearest neighbor algorithms were applied directly on the dataset using python programming language to develop the models. The model predicted a minimum and maximum number of days for COVID-19 patients to recover from the virus, the age group of patients who are of high risk not to recover from the COVID-19 pandemic, those who are likely to recover and those who might be likely to recover quickly from COVID-19 pandemic. The results of the present study have shown that the model developed with decision tree data mining algorithm is more efficient to predict the possibility of recovery of the infected patients from COVID-19 pandemic with the overall accuracy of 99.85 % which stands to be the best model developed among the models developed with other algorithms including support vector machine, naive Bayes, logistic regression, random forest, and K-nearest neighbor.

Download Full-text