scholarly journals Analisis performa metode Knn pada Dataset pasien pengidap Kanker Payudara

2020 ◽  
Vol 1 (2) ◽  
pp. 39-43
Author(s):  
Dewi Cahyanti ◽  
Alifah Rahmayani ◽  
Syafira Ainy Husniar

Abstrak-Kanker payudara adalah penyakit non kulit yang berasal dari sel kelenjar, saluran kelenjar, dan jaringan penunjang payudara. Paper ini menggunakan metode K Nearest Neighbor untuk mengklasifikasi dataset. K-Nearest Neighbor adalah metode untuk melakukan klasifikasi terhadap objek berdasarkan data pembelajaran yang jaraknya paling dekat dengan objek tersebut. Penelitian ini mencoba menerapkan metode knn pada dataset pasien pengidap penyakit kanker payudara, k yg diterapkan adalah k=3 hingga k=5 serta menerapkan crossvalidation dengan kfold=5, setelah dilakukan pengujian maka dengan metode KNN diperoleh hasil tertinggi untuk Akurasi dengan nilai 0,93 pada 20% keempat (K3), 20% Pertama(K4) dan 20% pertama(K5), untuk Presisi dengan nilai 0,97 pada 20% keempat(K3), untuk Recall dengan nilai 0,98 pada 20% ketiga (K3) dan F-measure dengan nilai 0,94 pada 20% keempat(K3) dan 20% ketiga(K5).

2019 ◽  
Vol 2 (1) ◽  
pp. 41-48
Author(s):  
Rimbun Siringoringo ◽  
Jamaludin Jamaludin

Pertumbuhan media sosial dan e-commerce mengubah cara berinteraksi dan menyampaikan pandangan, opini dan mood. Ulasan produk merupakan salah satu bentuk penyampaian opini dan sentimen konsumen terhadap sebuah produk secara online. Ulasan produk saat ini memiliki peranan yang sangat penting dalam mempengaruhi minat konsumen terhadap sebuah produk.  Analisis sentimen merupakan pendekatan yang banyak dikerjakan untuk mengekstrak informasi dan menggali opini berkaitan dengan ulasan produk. Analisis sentimen memiliki beberapa tantangan, yang pertama sering sekali hasil analisis sentimen yang dihasilkan oleh model-model prediksi berbeda dengan sentimen yang aktual, tantangan kedua adalah berkaitan dengan cara konsumen mengekpresikan sentimen dan mood selalu berbeda dari satu keadaan ke keadaan berikutnya. Pada penelitian ini dilakukan analisis sentimen berdasarkan ulasan produk sepatu Trendy Shoes merek Denim. Tahapan analisis sentimen terdiri dari pengumpulan data, pemrosesan awal, transformasi data, seleksi fitur dan tahapan klasifikasi menggunakan Suppport Vector Machine. Pemrosesan awal menerapkan tahapan text mining yakni case folding, non alpha numeric removal, stop words removal, dan stemming. Hasil analisis sentimen diukur menggunakan kriteria Akurasi, G-Mean, dan F-Measure. Dengan menerapkan pengujian pada tiga jenis data sentimen diperoleh hasil bahwa Suppport Vector Machine dapat mengklasifikasi sentimen dengan baik. Performa Suppport Vector Machine dibandingkan  dengan metode K-Nearest Neighor. Hasil klasifiasi sentimen menggunakan Suppport Vector Machine lebih unggul dari  K-Nearest Neighbor.


Author(s):  
Dicki Pajri ◽  
Yuyun Umaidah ◽  
Tesa Nur Padilah

Tokopedia is a popular marketplace used by e-commerce in Indonesia. Customers’ perception of Twitter towards Tokopedia can be used as an important source of information and can be processed into useful insights. Sentiment analysis is a solution that can be used to process the customers’ perception using K-Nearest Neighbor based on Particle Swarm Optimization. The purpose of this study is to classify customers’ perception based on positive, neutral, and negative classes. The test is carried out with four different scenarios and k values which are evaluated using a confusion matrix. Evaluation results showed the distribution of the dataset is 90:10 and the value of k = 1 is the best evaluation result, which is 88.11%. The feature selection was used for results by using Particle Swarm Optimization. The Particle Swarm Optimization used 20 iterations and 10 particles. It produced 97.9% the best evaluation accuracy, 96.17% precision, 96.62% recall, and 96.39% f-measure.


2016 ◽  
Vol 1 (1) ◽  
pp. 13 ◽  
Author(s):  
Debby Erce Sondakh

Penelitian ini bertujuan untuk mengukur dan membandingkan kinerja lima algoritma klasifikasi teks berbasis pembelajaran mesin, yaitu decision rules, decision tree, k-nearest neighbor (k-NN), naïve Bayes, dan Support Vector Machine (SVM), menggunakan dokumen teks multi-class. Perbandingan dilakukan pada efektifiatas algoritma, yaitu kemampuan untuk mengklasifikasi dokumen pada kategori yang tepat, menggunakan metode holdout atau percentage split. Ukuran efektifitas yang digunakan adalah precision, recall, F-measure, dan akurasi. Hasil eksperimen menunjukkan bahwa untuk algoritma naïve Bayes, semakin besar persentase dokumen pelatihan semakin tinggi akurasi model yang dihasilkan. Akurasi tertinggi naïve Bayes pada persentase 90/10, SVM pada 80/20, dan decision tree pada 70/30. Hasil eksperimen juga menunjukkan, algoritma naïve Bayes memiliki nilai efektifitas tertinggi di antara lima algoritma yang diuji, dan waktu membangun model klasiifikasi yang tercepat, yaitu 0.02 detik. Algoritma decision tree dapat mengklasifikasi dokumen teks dengan nilai akurasi yang lebih tinggi dibanding SVM, namun waktu membangun modelnya lebih lambat. Dalam hal waktu membangun model, k-NN adalah yang tercepat namun nilai akurasinya kurang.


MATICS ◽  
2016 ◽  
Vol 8 (2) ◽  
pp. 76
Author(s):  
Ihsan Ihsan

<p class="Abstract"><em>Abstract</em> – This study proposes a system for classification and counting the number of bacterial colonies using a photo image of bacteria. The system uses several image pretreatment process. Including Contrast Stretching, <em>Extended-Maxima Transform</em>, and <em>Regionprops</em>. The main purpose of this system is to determine the category of colonies of bacteria in large quantities can not be done manually. To build the algorithms necessary features must be determined such as <em>diameter, perimeter and roundness</em> method of determining the categories using KNN <em>(K-Nearest Neighbor)</em>. As a results of this research is classify three types of bacteria such as Lactobacillus Bulgaricus, Streptococcus thermophiles, and bifidobakterium Precision with a percentage of 97,97% and 87,09% F-Measure</p><p><strong>Keywords: Contrast Stretching, Lactobacillus, Regionprops, K-Nearest Neighbor</strong></p>


Author(s):  
Kirat Jadhav

Cryptocurrencies have revolutionized the process of trading in the digital world. Roughly one decade since the induction of the first bitcoin block, thousands of cryptocurrencies have been introduced. The anonymity offered by the cryptocurrencies also attracted the perpetuators of cybercrime. This paper attempts to examine the different machine learning approaches for efficiently identifying ransomware payments made to the operators using bitcoin transactions. Machine learning models may be developed based on patterns differentiating such cybercrime operations from normal bitcoin transactions in order to identify and report attacks. The machine learning approaches are evaluated on bitcoin ransomware dataset. Experimental results show that Gradient Boosting and XGBoost algorithms achieved better detection rate with respect to precision, recall and F-measure rates when compared with k-Nearest Neighbor, Random Forest, Naïve Bayes and Multilayer Perceptron approaches


Author(s):  
Annisya Aprilia Prasanti ◽  
M. Ali Fauzi ◽  
Muhammad Tanzil Furqon

<p>Sambat Online is one of the implementation of E-Government for complaints management provided by Malang City Government.  All of the complaints will be classified into its intended department. In this study, automatic complaint classification system using Neighbor Weighted K-Nearest Neighbor (NW-KNN) is poposed because Sambat Online has imbalanced data. The system developed consists of three main stages including preprocessing, N-Gram feature extraction, and classification using NW-KNN. Based on the experiment results, it can be concluded that the NW-KNN algorithm is able to classify the imbalanced data well with the most optimal k-neighbor value is 3 and unigram as the best features by 77.85% precision, 74.18% recall, and 75.25% f-measure value. Compared to the conventional KNN, NW-KNN algorithm also proved to be better for imbalanced data problems with very slightly differences.</p>


Author(s):  
N. Jayalakshmi ◽  
P. Padmaja ◽  
G. Jaya Suma

An interesting research area that permits the user to mine the significant information, called frequent subgraph, is Graph-Based Data Mining (GBDM). One of the well-known algorithms developed to extract frequent patterns is GASTON algorithm. Retrieving the interesting webpages from the log files contributes heavily to various applications. In this work, a webpage recommendation system has been proposed by introducing Chronological Cuckoo Search (Chronological-CS) algorithm and the Laplace correction based k-Nearest Neighbor (LKNN) to retrieve the useful webpage from the interesting webpage. Initially, W-Gaston algorithm extracts the interesting subgraph from the log files and provides it to the proposed webpage recommendation system. The interesting subgraphs subjected to clustering with the proposed Chronological-CS algorithm, which is developed by integrating the chronological concept into Cuckoo Search (CS) algorithm, provide various cluster groups. Then, the proposed LKNN algorithm recommends the webpage from the clusters. Simulation of the proposed webpage recommendation algorithm is done by utilizing the data from MSNBC and weblog database. The results are compared with various existing webpage recommendation models and analyzed based on precision, recall, and F-measure. The proposed webpage recommendation model achieved better performance than the existing models with the values of 0.9194, 0.8947, and 0.86736, respectively, for the precision, recall, and F-measure.


2021 ◽  
Author(s):  
Anshika Arora ◽  
Pinaki Chakraborty ◽  
M.P.S. Bhatia

Excessive use of smartphones throughout the day having dependency on them for social interaction, entertainment and information retrieval may lead users to develop nomophobia. This makes them feel anxious during non-availability of smartphones. This study describes the usefulness of real time smartphone usage data for prediction of nomophobia severity using machine learning. Data is collected from 141 undergraduate students analyzing their perception about their smartphone using the Nomophobia Questionnaire (NMP-Q) and their real time smartphone usage patterns using a purpose-built android application. Supervised machine learning models including Random Forest, Decision Tree, Support Vector Machines, Naïve Bayes and K-Nearest Neighbor are trained using two features sets where the first feature set comprises only the NMP-Q features and the other comprises real time smartphone usage features along with the NMP-Q features. Performance of these models is evaluated using f-measure and area under ROC and It is observed that all the models perform better when provided with smartphone usage features along with the NMP-Q features. Naïve Bayes outperforms other models in prediction of nomophobia achieving a f-measure value of 0.891 and ROC area value of 0.933.


2020 ◽  
Vol 1 (2) ◽  
pp. 29-33
Author(s):  
Andi Maulida Argina

Diabetes adalah penyakit yang berlangsung lama atau kronis serta ditandai dengan kadar gula (glukosa) darah yang tinggi atau di atas nilai normal. Jika diabetes tidak dikontrol dengan baik, Pengujian performa berbagai metode pada sebuah dataset merupakan salah satu cara dalam penetapan metode klasifikasi yang tepat, masalah yang diangkat pada penelitian ini adalah bagaimana mengukur performa metode klasifikasi dalam mengelola dataset penderita diabetes. Metode yang digunakan yaitu algoritma K-Nearest Neighbor (KNN), dimana merupakan sebuah metode untuk melakukan klasifikasi terhadap objek berdasarkan data pembelajaran yang jaraknya paling dekat dengan objek tersebut. Pada hasil akhir penelitian ini, telah dihitung akurasi tertinggi 39% pada K=3, presisi tertinggi 65% pada K=3 dan K=5, recall tertinggi 36% pada K=3, dan F-Measure tertinggi 46% pada K=3.


Author(s):  
Abdulfatai Ganiyu Oladepo ◽  
Amos Orenyi Bajeh ◽  
Abdullateef Oluwagbemiga Balogun ◽  
Hammed Adeleye Mojeed ◽  
Abdulsalam Abiodun Salman ◽  
...  

This study presents a novel framework based on a heterogeneous ensemble method and a hybrid dimensionality reduction technique for spam detection in micro-blogging social networks. A hybrid of Information Gain (IG) and Principal Component Analysis (PCA) (dimensionality reduction) was implemented for the selection of important features and a heterogeneous ensemble consisting of Naïve Bayes (NB), K Nearest Neighbor (KNN), Logistic Regression (LR) and Repeated Incremental Pruning to Produce Error Reduction (RIPPER) classifiers based on Average of Probabilities (AOP) was used for spam detection. The proposed framework was applied on MPI_SWS and SAC’13 Tip spam datasets and the developed models were evaluated based on accuracy, precision, recall, f-measure, and area under the curve (AUC). From the experimental results, the proposed framework (that is, Ensemble + IG + PCA) outperformed other experimented methods on studied spam datasets. Specifically, the proposed method had an average accuracy value of 87.5%, an average precision score of 0.877, an average recall value of 0.845, an average F-measure value of 0.872 and an average AUC value of 0.943. Also, the proposed method had better performance than some existing methods. Consequently, this study has shown that addressing high dimensionality in spam datasets, in this case, a hybrid of IG and PCA with a heterogeneous ensemble method can produce a more effective method for detecting spam contents.


Sign in / Sign up

Export Citation Format

Share Document