Analisis performa metode Knn pada Dataset pasien pengidap Kanker Payudara

Dewi Cahyanti; Alifah Rahmayani; Syafira Ainy Husniar

doi:10.33096/ijodas.v1i2.13

Analisis performa metode Knn pada Dataset pasien pengidap Kanker Payudara

Indonesian Journal of Data and Science ◽

10.33096/ijodas.v1i2.13 ◽

2020 ◽

Vol 1 (2) ◽

pp. 39-43

Author(s):

Dewi Cahyanti ◽

Alifah Rahmayani ◽

Syafira Ainy Husniar

Keyword(s):

Nearest Neighbor ◽

K Nearest Neighbor ◽

F Measure

Abstrak-Kanker payudara adalah penyakit non kulit yang berasal dari sel kelenjar, saluran kelenjar, dan jaringan penunjang payudara. Paper ini menggunakan metode K Nearest Neighbor untuk mengklasifikasi dataset. K-Nearest Neighbor adalah metode untuk melakukan klasifikasi terhadap objek berdasarkan data pembelajaran yang jaraknya paling dekat dengan objek tersebut. Penelitian ini mencoba menerapkan metode knn pada dataset pasien pengidap penyakit kanker payudara, k yg diterapkan adalah k=3 hingga k=5 serta menerapkan crossvalidation dengan kfold=5, setelah dilakukan pengujian maka dengan metode KNN diperoleh hasil tertinggi untuk Akurasi dengan nilai 0,93 pada 20% keempat (K3), 20% Pertama(K4) dan 20% pertama(K5), untuk Presisi dengan nilai 0,97 pada 20% keempat(K3), untuk Recall dengan nilai 0,98 pada 20% ketiga (K3) dan F-measure dengan nilai 0,94 pada 20% keempat(K3) dan 20% ketiga(K5).

Download Full-text

Text Mining dan Klasterisasi Sentimen Pada Ulasan Produk Toko Online

Jurnal Teknologi dan Ilmu Komputer Prima (JUTIKOMP) ◽

10.34012/jutikomp.v2i1.456 ◽

2019 ◽

Vol 2 (1) ◽

pp. 41-48

Author(s):

Rimbun Siringoringo ◽

Jamaludin Jamaludin

Keyword(s):

Text Mining ◽

Nearest Neighbor ◽

K Nearest Neighbor ◽

F Measure

Pertumbuhan media sosial dan e-commerce mengubah cara berinteraksi dan menyampaikan pandangan, opini dan mood. Ulasan produk merupakan salah satu bentuk penyampaian opini dan sentimen konsumen terhadap sebuah produk secara online. Ulasan produk saat ini memiliki peranan yang sangat penting dalam mempengaruhi minat konsumen terhadap sebuah produk. Analisis sentimen merupakan pendekatan yang banyak dikerjakan untuk mengekstrak informasi dan menggali opini berkaitan dengan ulasan produk. Analisis sentimen memiliki beberapa tantangan, yang pertama sering sekali hasil analisis sentimen yang dihasilkan oleh model-model prediksi berbeda dengan sentimen yang aktual, tantangan kedua adalah berkaitan dengan cara konsumen mengekpresikan sentimen dan mood selalu berbeda dari satu keadaan ke keadaan berikutnya. Pada penelitian ini dilakukan analisis sentimen berdasarkan ulasan produk sepatu Trendy Shoes merek Denim. Tahapan analisis sentimen terdiri dari pengumpulan data, pemrosesan awal, transformasi data, seleksi fitur dan tahapan klasifikasi menggunakan Suppport Vector Machine. Pemrosesan awal menerapkan tahapan text mining yakni case folding, non alpha numeric removal, stop words removal, dan stemming. Hasil analisis sentimen diukur menggunakan kriteria Akurasi, G-Mean, dan F-Measure. Dengan menerapkan pengujian pada tiga jenis data sentimen diperoleh hasil bahwa Suppport Vector Machine dapat mengklasifikasi sentimen dengan baik. Performa Suppport Vector Machine dibandingkan dengan metode K-Nearest Neighor. Hasil klasifiasi sentimen menggunakan Suppport Vector Machine lebih unggul dari K-Nearest Neighbor.

Download Full-text

K-Nearest Neighbor Berbasis Particle Swarm Optimization untuk Analisis Sentimen Terhadap Tokopedia

Jurnal Teknik Informatika dan Sistem Informasi ◽

10.28932/jutisi.v6i2.2658 ◽

2020 ◽

Vol 6 (2) ◽

Author(s):

Dicki Pajri ◽

Yuyun Umaidah ◽

Tesa Nur Padilah

Keyword(s):

Particle Swarm Optimization ◽

Nearest Neighbor ◽

Confusion Matrix ◽

Particle Swarm ◽

K Nearest Neighbor ◽

Swarm Optimization ◽

Evaluation Result ◽

Source Of Information ◽

F Measure ◽

Evaluation Accuracy

Tokopedia is a popular marketplace used by e-commerce in Indonesia. Customers’ perception of Twitter towards Tokopedia can be used as an important source of information and can be processed into useful insights. Sentiment analysis is a solution that can be used to process the customers’ perception using K-Nearest Neighbor based on Particle Swarm Optimization. The purpose of this study is to classify customers’ perception based on positive, neutral, and negative classes. The test is carried out with four different scenarios and k values which are evaluated using a confusion matrix. Evaluation results showed the distribution of the dataset is 90:10 and the value of k = 1 is the best evaluation result, which is 88.11%. The feature selection was used for results by using Particle Swarm Optimization. The Particle Swarm Optimization used 20 iterations and 10 particles. It produced 97.9% the best evaluation accuracy, 96.17% precision, 96.62% recall, and 96.39% f-measure.

Download Full-text

COMPARATIVE STUDY OF CLASSIFICATION ALGORITHMS: HOLDOUTS AS ACCURACY ESTIMATION

CogITo Smart Journal ◽

10.31154/cogito.v1i1.2.13-23 ◽

2016 ◽

Vol 1 (1) ◽

pp. 13 ◽

Cited By ~ 1

Author(s):

Debby Erce Sondakh

Keyword(s):

Decision Tree ◽

Nearest Neighbor ◽

Naive Bayes ◽

Decision Rules ◽

Naïve Bayes ◽

Support Vector ◽

Classification Algorithms ◽

K Nearest Neighbor ◽

Accuracy Estimation ◽

F Measure

Penelitian ini bertujuan untuk mengukur dan membandingkan kinerja lima algoritma klasifikasi teks berbasis pembelajaran mesin, yaitu decision rules, decision tree, k-nearest neighbor (k-NN), naïve Bayes, dan Support Vector Machine (SVM), menggunakan dokumen teks multi-class. Perbandingan dilakukan pada efektifiatas algoritma, yaitu kemampuan untuk mengklasifikasi dokumen pada kategori yang tepat, menggunakan metode holdout atau percentage split. Ukuran efektifitas yang digunakan adalah precision, recall, F-measure, dan akurasi. Hasil eksperimen menunjukkan bahwa untuk algoritma naïve Bayes, semakin besar persentase dokumen pelatihan semakin tinggi akurasi model yang dihasilkan. Akurasi tertinggi naïve Bayes pada persentase 90/10, SVM pada 80/20, dan decision tree pada 70/30. Hasil eksperimen juga menunjukkan, algoritma naïve Bayes memiliki nilai efektifitas tertinggi di antara lima algoritma yang diuji, dan waktu membangun model klasiifikasi yang tercepat, yaitu 0.02 detik. Algoritma decision tree dapat mengklasifikasi dokumen teks dengan nilai akurasi yang lebih tinggi dibanding SVM, namun waktu membangun modelnya lebih lambat. Dalam hal waktu membangun model, k-NN adalah yang tercepat namun nilai akurasinya kurang.

Download Full-text

KLASIFIKASI DAN IDENTIFIKASI JUMLAH KOLONI PADA CITRA BAKTERI DENGAN METODE K-NEAREST NEIGHBOR

MATICS ◽

10.18860/mat.v8i2.3723 ◽

2016 ◽

Vol 8 (2) ◽

pp. 76

Author(s):

Ihsan Ihsan

Keyword(s):

Nearest Neighbor ◽

Lactobacillus Bulgaricus ◽

K Nearest Neighbor ◽

Contrast Stretching ◽

Image Pretreatment ◽

F Measure ◽

Photo Image

Abstract – This study proposes a system for classification and counting the number of bacterial colonies using a photo image of bacteria. The system uses several image pretreatment process. Including Contrast Stretching, Extended-Maxima Transform, and Regionprops. The main purpose of this system is to determine the category of colonies of bacteria in large quantities can not be done manually. To build the algorithms necessary features must be determined such as diameter, perimeter and roundness method of determining the categories using KNN (K-Nearest Neighbor). As a results of this research is classify three types of bacteria such as Lactobacillus Bulgaricus, Streptococcus thermophiles, and bifidobakterium Precision with a percentage of 97,97% and 87,09% F-MeasureKeywords: Contrast Stretching, Lactobacillus, Regionprops, K-Nearest Neighbor

Download Full-text

Investigating Machine Learning Approaches for Bitcoin Ransomware Payment Detection Systems

Volume 5 - 2020, Issue 9 - September - International Journal of Innovative Science and Research Technology ◽

10.38124/ijisrt20sep784 ◽

2020 ◽

Vol 5 (9) ◽

pp. 1216-1222

Author(s):

Kirat Jadhav

Keyword(s):

Machine Learning ◽

Detection Rate ◽

Nearest Neighbor ◽

Gradient Boosting ◽

Learning Approaches ◽

K Nearest Neighbor ◽

Detection Systems ◽

Digital World ◽

F Measure ◽

Machine Learning Models

Cryptocurrencies have revolutionized the process of trading in the digital world. Roughly one decade since the induction of the first bitcoin block, thousands of cryptocurrencies have been introduced. The anonymity offered by the cryptocurrencies also attracted the perpetuators of cybercrime. This paper attempts to examine the different machine learning approaches for efficiently identifying ransomware payments made to the operators using bitcoin transactions. Machine learning models may be developed based on patterns differentiating such cybercrime operations from normal bitcoin transactions in order to identify and report attacks. The machine learning approaches are evaluated on bitcoin ransomware dataset. Experimental results show that Gradient Boosting and XGBoost algorithms achieved better detection rate with respect to precision, recall and F-measure rates when compared with k-Nearest Neighbor, Random Forest, Naïve Bayes and Multilayer Perceptron approaches

Download Full-text

Neighbor Weighted K-Nearest Neighbor for Sambat Online Classification

Indonesian Journal of Electrical Engineering and Computer Science ◽

10.11591/ijeecs.v12.i1.pp155-160 ◽

2018 ◽

Vol 12 (1) ◽

pp. 155

Author(s):

Annisya Aprilia Prasanti ◽

M. Ali Fauzi ◽

Muhammad Tanzil Furqon

Keyword(s):

Feature Extraction ◽

Classification System ◽

Nearest Neighbor ◽

Imbalanced Data ◽

City Government ◽

K Nearest Neighbor ◽

Online Classification ◽

N Gram ◽

F Measure

Sambat Online is one of the implementation of E-Government for complaints management provided by Malang City Government. All of the complaints will be classified into its intended department. In this study, automatic complaint classification system using Neighbor Weighted K-Nearest Neighbor (NW-KNN) is poposed because Sambat Online has imbalanced data. The system developed consists of three main stages including preprocessing, N-Gram feature extraction, and classification using NW-KNN. Based on the experiment results, it can be concluded that the NW-KNN algorithm is able to classify the imbalanced data well with the most optimal k-neighbor value is 3 and unigram as the best features by 77.85% precision, 74.18% recall, and 75.25% f-measure value. Compared to the conventional KNN, NW-KNN algorithm also proved to be better for imbalanced data problems with very slightly differences.

Download Full-text

Webpage Recommendation System Using Interesting Subgraphs and Laplace Based k-Nearest Neighbor

International Journal of Pattern Recognition and Artificial Intelligence ◽

10.1142/s0218001420530031 ◽

2019 ◽

Vol 34 (03) ◽

pp. 2053003

Author(s):

N. Jayalakshmi ◽

P. Padmaja ◽

G. Jaya Suma

Keyword(s):

Recommendation System ◽

Nearest Neighbor ◽

Cuckoo Search ◽

Research Area ◽

Frequent Patterns ◽

K Nearest Neighbor ◽

Recommendation Algorithm ◽

Significant Information ◽

Log Files ◽

F Measure

An interesting research area that permits the user to mine the significant information, called frequent subgraph, is Graph-Based Data Mining (GBDM). One of the well-known algorithms developed to extract frequent patterns is GASTON algorithm. Retrieving the interesting webpages from the log files contributes heavily to various applications. In this work, a webpage recommendation system has been proposed by introducing Chronological Cuckoo Search (Chronological-CS) algorithm and the Laplace correction based k-Nearest Neighbor (LKNN) to retrieve the useful webpage from the interesting webpage. Initially, W-Gaston algorithm extracts the interesting subgraph from the log files and provides it to the proposed webpage recommendation system. The interesting subgraphs subjected to clustering with the proposed Chronological-CS algorithm, which is developed by integrating the chronological concept into Cuckoo Search (CS) algorithm, provide various cluster groups. Then, the proposed LKNN algorithm recommends the webpage from the clusters. Simulation of the proposed webpage recommendation algorithm is done by utilizing the data from MSNBC and weblog database. The results are compared with various existing webpage recommendation models and analyzed based on precision, recall, and F-measure. The proposed webpage recommendation model achieved better performance than the existing models with the values of 0.9194, 0.8947, and 0.86736, respectively, for the precision, recall, and F-measure.

Download Full-text

Real Time Smartphone Data for Prediction of Nomophobia Severity using Supervised Machine Learning

10.21467/proceedings.114.11 ◽

2021 ◽

Author(s):

Anshika Arora ◽

Pinaki Chakraborty ◽

M.P.S. Bhatia

Keyword(s):

Machine Learning ◽

Real Time ◽

Undergraduate Students ◽

Nearest Neighbor ◽

Naive Bayes ◽

Naïve Bayes ◽

Supervised Machine Learning ◽

Support Vector ◽

K Nearest Neighbor ◽

F Measure

Excessive use of smartphones throughout the day having dependency on them for social interaction, entertainment and information retrieval may lead users to develop nomophobia. This makes them feel anxious during non-availability of smartphones. This study describes the usefulness of real time smartphone usage data for prediction of nomophobia severity using machine learning. Data is collected from 141 undergraduate students analyzing their perception about their smartphone using the Nomophobia Questionnaire (NMP-Q) and their real time smartphone usage patterns using a purpose-built android application. Supervised machine learning models including Random Forest, Decision Tree, Support Vector Machines, Naïve Bayes and K-Nearest Neighbor are trained using two features sets where the first feature set comprises only the NMP-Q features and the other comprises real time smartphone usage features along with the NMP-Q features. Performance of these models is evaluated using f-measure and area under ROC and It is observed that all the models perform better when provided with smartphone usage features along with the NMP-Q features. Naïve Bayes outperforms other models in prediction of nomophobia achieving a f-measure value of 0.891 and ROC area value of 0.933.

Download Full-text

Penerapan Metode Klasifikasi K-Nearest Neigbor pada Dataset Penderita Penyakit Diabetes

Indonesian Journal of Data and Science ◽

10.33096/ijodas.v1i2.11 ◽

2020 ◽

Vol 1 (2) ◽

pp. 29-33

Author(s):

Andi Maulida Argina

Keyword(s):

Nearest Neighbor ◽

K Nearest Neighbor ◽

F Measure

Diabetes adalah penyakit yang berlangsung lama atau kronis serta ditandai dengan kadar gula (glukosa) darah yang tinggi atau di atas nilai normal. Jika diabetes tidak dikontrol dengan baik, Pengujian performa berbagai metode pada sebuah dataset merupakan salah satu cara dalam penetapan metode klasifikasi yang tepat, masalah yang diangkat pada penelitian ini adalah bagaimana mengukur performa metode klasifikasi dalam mengelola dataset penderita diabetes. Metode yang digunakan yaitu algoritma K-Nearest Neighbor (KNN), dimana merupakan sebuah metode untuk melakukan klasifikasi terhadap objek berdasarkan data pembelajaran yang jaraknya paling dekat dengan objek tersebut. Pada hasil akhir penelitian ini, telah dihitung akurasi tertinggi 39% pada K=3, presisi tertinggi 65% pada K=3 dan K=5, recall tertinggi 36% pada K=3, dan F-Measure tertinggi 46% pada K=3.

Download Full-text

Heterogeneous Ensemble with Combined Dimensionality Reduction for Social Spam Detection

International Journal of Interactive Mobile Technologies (iJIM) ◽

10.3991/ijim.v15i17.19915 ◽

2021 ◽

Vol 15 (17) ◽

pp. 84 ◽

Cited By ~ 1

Author(s):

Abdulfatai Ganiyu Oladepo ◽

Amos Orenyi Bajeh ◽

Abdullateef Oluwagbemiga Balogun ◽

Hammed Adeleye Mojeed ◽

Abdulsalam Abiodun Salman ◽

...

Keyword(s):

Dimensionality Reduction ◽

Nearest Neighbor ◽

Information Gain ◽

Area Under The Curve ◽

Principal Component ◽

Ensemble Method ◽

Spam Detection ◽

K Nearest Neighbor ◽

Heterogeneous Ensemble ◽

F Measure

This study presents a novel framework based on a heterogeneous ensemble method and a hybrid dimensionality reduction technique for spam detection in micro-blogging social networks. A hybrid of Information Gain (IG) and Principal Component Analysis (PCA) (dimensionality reduction) was implemented for the selection of important features and a heterogeneous ensemble consisting of Naïve Bayes (NB), K Nearest Neighbor (KNN), Logistic Regression (LR) and Repeated Incremental Pruning to Produce Error Reduction (RIPPER) classifiers based on Average of Probabilities (AOP) was used for spam detection. The proposed framework was applied on MPI_SWS and SAC’13 Tip spam datasets and the developed models were evaluated based on accuracy, precision, recall, f-measure, and area under the curve (AUC). From the experimental results, the proposed framework (that is, Ensemble + IG + PCA) outperformed other experimented methods on studied spam datasets. Specifically, the proposed method had an average accuracy value of 87.5%, an average precision score of 0.877, an average recall value of 0.845, an average F-measure value of 0.872 and an average AUC value of 0.943. Also, the proposed method had better performance than some existing methods. Consequently, this study has shown that addressing high dimensionality in spam datasets, in this case, a hybrid of IG and PCA with a heterogeneous ensemble method can produce a more effective method for detecting spam contents.

Download Full-text