scholarly journals K-Nearest Neighbor with K-Fold Cross Validation and Analytic Hierarchy Process on Data Classification

Author(s):  
Zoelkarnain Rinanda Tembusai ◽  
Herman Mawengkang ◽  
Muhammad Zarlis

This study analyzes the performance of the k-Nearest Neighbor method with the k-Fold Cross Validation algorithm as an evaluation model and the Analytic Hierarchy Process method as feature selection for the data classification process in order to obtain the best level of accuracy and machine learning model. The best test results are in fold-3, which is getting an accuracy rate of 95%. Evaluation of the k-Nearest Neighbor model with k-Fold Cross Validation can get a good machine learning model and the Analytic Hierarchy Process as a feature selection also gets optimal results and can reduce the performance of the k-Nearest Neighbor method because it only uses features that have been selected based on the level of importance for decision making.

2020 ◽  
Vol 37 (4) ◽  
pp. 563-569
Author(s):  
Dželila Mehanović ◽  
Jasmin Kevrić

Security is one of the most actual topics in the online world. Lists of security threats are constantly updated. One of those threats are phishing websites. In this work, we address the problem of phishing websites classification. Three classifiers were used: K-Nearest Neighbor, Decision Tree and Random Forest with the feature selection methods from Weka. Achieved accuracy was 100% and number of features was decreased to seven. Moreover, when we decreased the number of features, we decreased time to build models too. Time for Random Forest was decreased from the initial 2.88s and 3.05s for percentage split and 10-fold cross validation to 0.02s and 0.16s respectively.


2019 ◽  
Vol 6 (2) ◽  
pp. 226-235
Author(s):  
Muhammad Rangga Aziz Nasution ◽  
Mardhiya Hayaty

Salah satu cabang ilmu komputer yaitu pembelajaran mesin (machine learning) menjadi tren dalam beberapa waktu terakhir. Pembelajaran mesin bekerja dengan memanfaatkan data dan algoritma untuk membuat model dengan pola dari kumpulan data tersebut. Selain itu, pembelajaran mesin juga mempelajari bagaimama model yang telah dibuat dapat memprediksi keluaran (output) berdasarkan pola yang ada. Terdapat dua jenis metode pembelajaran mesin yang dapat digunakan untuk analisis sentimen:  supervised learning dan unsupervised learning. Penelitian ini akan membandingkan dua algoritma klasifikasi yang termasuk dari supervised learning: algoritma K-Nearest Neighbor dan Support Vector Machine, dengan cara membuat model dari masing-masing algoritma dengan objek teks sentimen. Perbandingan dilakukan untuk mengetahui algoritma mana lebih baik dalam segi akurasi dan waktu proses. Hasil pada perhitungan akurasi menunjukkan bahwa metode Support Vector Machine lebih unggul dengan nilai 89,70% tanpa K-Fold Cross Validation dan 88,76% dengan K-Fold Cross Validation. Sedangkan pada perhitungan waktu proses metode K-Nearest Neighbor lebih unggul dengan waktu proses 0.0160s tanpa K-Fold Cross Validation dan 0.1505s dengan K-Fold Cross Validation.


Author(s):  
Minh Tuan Le ◽  
Minh Thanh Vo ◽  
Nhat Tan Pham ◽  
Son V.T Dao

In the current health system, it is very difficult for medical practitioners/physicians to diagnose the effectiveness of heart contraction. In this research, we proposed a machine learning model to predict heart contraction using an artificial neural network (ANN). We also proposed a novel wrapper-based feature selection utilizing a grey wolf optimization (GWO) to reduce the number of required input attributes. In this work, we compared the results achieved using our method and several conventional machine learning algorithms approaches such as support vector machine, decision tree, K-nearest neighbor, naïve bayes, random forest, and logistic regression. Computational results show not only that much fewer features are needed, but also higher prediction accuracy can be achieved around 87%. This work has the potential to be applicable to clinical practice and become a supporting tool for doctors/physicians.


Sensors ◽  
2020 ◽  
Vol 20 (18) ◽  
pp. 5362 ◽  
Author(s):  
Luca Antognoli ◽  
Sara Moccia ◽  
Lucia Migliorelli ◽  
Sara Casaccia ◽  
Lorenzo Scalise ◽  
...  

Background: Heartbeat detection is a crucial step in several clinical fields. Laser Doppler Vibrometer (LDV) is a promising non-contact measurement for heartbeat detection. The aim of this work is to assess whether machine learning can be used for detecting heartbeat from the carotid LDV signal. Methods: The performances of Support Vector Machine (SVM), Decision Tree (DT), Random Forest (RF) and K-Nearest Neighbor (KNN) were compared using the leave-one-subject-out cross-validation as the testing protocol in an LDV dataset collected from 28 subjects. The classification was conducted on LDV signal windows, which were labeled as beat, if containing a beat, or no-beat, otherwise. The labeling procedure was performed using electrocardiography as the gold standard. Results: For the beat class, the f1-score (f1) values were 0.93, 0.93, 0.95, 0.96 for RF, DT, KNN and SVM, respectively. No statistical differences were found between the classifiers. When testing the SVM on the full-length (10 min long) LDV signals, to simulate a real-world application, we achieved a median macro-f1 of 0.76. Conclusions: Using machine learning for heartbeat detection from carotid LDV signals showed encouraging results, representing a promising step in the field of contactless cardiovascular signal analysis.


Author(s):  
Grassella Gunsyang ◽  
Ika Purnamasari ◽  
Fidia Deny Tisna Amijaya

Algoritma Neighbor Weighted K-Nearest Neighbor (NWKNN) merupakan pengembangan dari algoritma K-Nearest Neighbor (KNN), dengan memberikan bobot pada setiap kelas yang akan diklasifikasikan. Penelitian ini membahas tentang klasifikasi menggunakan algoritma NWKNN yang diaplikasikan pada data status pembayaran premi. Tujuannya untuk mengetahui nilai eksponen (E) dan nilai ketetanggaan (K) yang optimal, serta nilai akurasi dari klasifikasi data status pembayaran Premi di PT. Bumiputera Kota Samarinda. Tahapan dalam penelitian ini yaitu menentukan nilai E dan nilai K menggunakan k-fold cross validation, menghitung jarak euclidean, menghitung bobot dan skor setiap kelas, melihat nilai skor terbesar untuk menentukan hasil klasifikasi, kemudian menghitung nilai akurasi klasifikasi. Hasil penelitian menunjukkan bahwa nilai K dan nilai E yang optimal untuk klasifikasi status pembayaran premi di PT. Bumiputera Kota Samarinda menggunakan NWKNN sebesar K=3 dan E=6 dengan nilai akurasi sebesar 75%.


Author(s):  
Mahendra Awale ◽  
Jean-Louis Reymond

<div>Here we report PPB2 as a target prediction tool assigning targets to a query molecule based on ChEMBL data. PPB2 computes ligand similarities using molecular fingerprints encoding composition (MQN), molecular shape and pharmacophores (Xfp), and substructures (ECfp4), and features an unprecedented combination of nearest neighbor (NN) searches and Naïve Bayes (NB) machine learning, together with simple NN searches, NB and Deep Neural Network (DNN) machine learning models as further options. Although NN(ECfp4) gives the best results in terms of recall in a 10-fold cross-validation study, combining NN searches with NB machine learning provides superior precision statistics, as well as better results in a case study predicting off-targets of a recently reported TRPV6 calcium channel inhibitor, illustrating the value of this combined approach. PPB2 is available to assess possible off-targets of small molecule drug-like compounds by public access at ppb2.gdb.tools.</div>


Author(s):  
Yuhong Huang ◽  
Wenben Chen ◽  
Xiaoling Zhang ◽  
Shaofu He ◽  
Nan Shao ◽  
...  

Aim: After neoadjuvant chemotherapy (NACT), tumor shrinkage pattern is a more reasonable outcome to decide a possible breast-conserving surgery (BCS) than pathological complete response (pCR). The aim of this article was to establish a machine learning model combining radiomics features from multiparametric MRI (mpMRI) and clinicopathologic characteristics, for early prediction of tumor shrinkage pattern prior to NACT in breast cancer.Materials and Methods: This study included 199 patients with breast cancer who successfully completed NACT and underwent following breast surgery. For each patient, 4,198 radiomics features were extracted from the segmented 3D regions of interest (ROI) in mpMRI sequences such as T1-weighted dynamic contrast-enhanced imaging (T1-DCE), fat-suppressed T2-weighted imaging (T2WI), and apparent diffusion coefficient (ADC) map. The feature selection and supervised machine learning algorithms were used to identify the predictors correlated with tumor shrinkage pattern as follows: (1) reducing the feature dimension by using ANOVA and the least absolute shrinkage and selection operator (LASSO) with 10-fold cross-validation, (2) splitting the dataset into a training dataset and testing dataset, and constructing prediction models using 12 classification algorithms, and (3) assessing the model performance through an area under the curve (AUC), accuracy, sensitivity, and specificity. We also compared the most discriminative model in different molecular subtypes of breast cancer.Results: The Multilayer Perception (MLP) neural network achieved higher AUC and accuracy than other classifiers. The radiomics model achieved a mean AUC of 0.975 (accuracy = 0.912) on the training dataset and 0.900 (accuracy = 0.828) on the testing dataset with 30-round 6-fold cross-validation. When incorporating clinicopathologic characteristics, the mean AUC was 0.985 (accuracy = 0.930) on the training dataset and 0.939 (accuracy = 0.870) on the testing dataset. The model further achieved good AUC on the testing dataset with 30-round 5-fold cross-validation in three molecular subtypes of breast cancer as following: (1) HR+/HER2–: 0.901 (accuracy = 0.816), (2) HER2+: 0.940 (accuracy = 0.865), and (3) TN: 0.837 (accuracy = 0.811).Conclusions: It is feasible that our machine learning model combining radiomics features and clinical characteristics could provide a potential tool to predict tumor shrinkage patterns prior to NACT. Our prediction model will be valuable in guiding NACT and surgical treatment in breast cancer.


Author(s):  
Muhammad Irfan ◽  
Setio Basuki ◽  
Yufis Azhar

Maternal mortality rate (MMR) in Indonesia intercensal population survey (SUPAS) was considered high. For pregnancy risk detection, the public health center (puskesmas) applies a Poedji Rochjati screening card (KSPR) demonstrating 20 features. In addition to KSPR, pregnancy risk monitoring has been assisted with a pregnancy control card. Because of the differences in the number of features between the two control cards, it is necessary to make agreements between them. Our objectives are determining the most influential features, exploring the links among features on the KSPR and pregnancy control cards, and building a machine learning model for predicting pregnancy risk. For the first objective, we use correlation-based feature selection (CFS) and C5.0 algorithm. The next objective was answered by the union operation in the features produced by the two techniques. By performing the machine learning experiment on these features, the accuracy of the XGBoost algorithm demonstrated the hightest results of 94% followed by random forest, Naïve Bayes, and k-Nearest neighbor algorithms, 87%, 66%, and 60% respectively. Interpretability aspects are implemented with SHAP and LIME to provide more insight for classification model. In conclusion, the similarity feature generated in the two interpretation approaches confirmed that Cesar was dominant in determining pregnancy risk.


2016 ◽  
Vol 7 (4) ◽  
Author(s):  
Mochammad Yusa ◽  
Ema Utami ◽  
Emha T. Luthfi

Abstract. Readmission is associated with quality measures on patients in hospitals. Different attributes related to diabetic patients such as medication, ethnicity, race, lifestyle, age, and others result in the calculation of quality care that tends to be complicated. Classification techniques of data mining can solve this problem. In this paper, the evaluation on three different classifiers, i.e. Decision Tree, k-Nearest Neighbor (k-NN), dan Naive Bayes with various settingparameter, is developed by using 10-Fold Cross Validation technique. The targets of parameter performance evaluated is based on term of Accuracy, Mean Absolute Error (MAE), dan Kappa Statistic. The selected dataset consists of 47 attributes and 49.735 records. The result shows that k-NN classifier with k=100 has a better performance in term of accuracy and Kappa Statistic, but Naive Bayes outperforms in term of MAE among other classifiers. Keywords: k-NN, naive bayes, diabetes, readmissionAbstrak. Proses Readmisi dikaitkan dengan perhitungan kualitas penanganan pasien di rumah sakit. Perbedaan atribut-atribut yang berhubungan dengan pasien diabetes proses medikasi, etnis, ras, gaya hidup, umur, dan lain-lain, mengakibatkan perhitungan kualitas cenderung rumit. Teknik klasifikasi data mining dapat menjadi solusi dalam perhitungan kualitas ini. Teknik klasifikasi merupakan salah satu teknik data mining yang perkembangannya cukup signifikan. Di dalam penelitian ini, model algoritma klasifikasi Decision Tree, k-Nearest Neighbor (k-NN), dan Naive Bayes dengan berbagai parameter setting akan dievaluasi performanya berdasarkan nilai performa Accuracy, Mean AbsoluteError (MAE), dan Kappa Statistik dengan metode 10-Fold Cross Validation. Dataset yang dievaluasi memiliki 47 atribut dengan 49.735 records. Hasil penelitian menunjukan bahwa performa accuracy, MAE, dan Kappa Statistik terbaik didapatkan dari Model Algoritma Naive Bayes.Kata Kunci: k-NN, naive bayes, diabetes, readmisi


Sign in / Sign up

Export Citation Format

Share Document