Information Retrieval Document Classified with K-Nearest Neighbor

Badruz Zaman; Endah Purwanti; Alifian Sukma

doi:10.20473/rlj.v1i2.1177

Information Retrieval Document Classified with K-Nearest Neighbor

Record and Library Journal ◽

10.20473/rlj.v1i2.1177 ◽

2016 ◽

Vol 1 (2) ◽

pp. 129

Author(s):

Badruz Zaman ◽

Endah Purwanti ◽

Alifian Sukma

Keyword(s):

Information Retrieval ◽

Evaluation System ◽

Nearest Neighbor ◽

Technology Development ◽

Health Science ◽

Cosine Similarity ◽

K Nearest Neighbor ◽

Physical Sciences ◽

Social Sciences And Humanities ◽

A Value

Along with the rapid advancement of technology development led to the amount of information available is also increasingly abundant. The aim of this study was to determine how the implementation of information retrieval system in the classification of the journal by using the cosine similarity and K-Nearest Neighbor (KNN). The data used as many as 160 documents with categories such as Physical Sciences and Engineering, Life Science, Health Science, and Social Sciences and Humanities. Construction stage begins with the use of text mining processing, the weighting of each token by using the term frequency-inverse document frequency (TF-IDF), calculate the degree of similarity of each document by using the cosine similarity and classification using k-Nearest Neighbor.Evaluation is done by using the testing documents as much as 20 documents, with a value of k = {37, 41, 43}. Evaluation system shows the level of success in classifying documents on the value of k = 43 with a value precision of 0501. System test results showed that 20 document testing used can be classified according to the actual category.

Information Retrieval Document Classified with K-Nearest Neighbor

Record and Library Journal ◽

10.20473/rlj.v1-i2.2015.129-138 ◽

2018 ◽

Vol 1 (2) ◽

pp. 129 ◽

Cited By ~ 1

Author(s):

Alifian Sukma ◽

Badruz Zaman ◽

Endah Purwanti

Keyword(s):

Information Retrieval ◽

Evaluation System ◽

Nearest Neighbor ◽

Technology Development ◽

Health Science ◽

Cosine Similarity ◽

K Nearest Neighbor ◽

Physical Sciences ◽

Social Sciences And Humanities ◽

A Value

Along with the rapid advancement of technology development led to the amount of information available is also increasingly abundant. The aim of this study was to determine how the implementation of information retrieval system in the classification of the journal by using the cosine similarity and K-Nearest Neighbor (KNN).The data used as many as 160 documents with categories such as Physical Sciences and Engineering, Life Science, Health Science, and Social Sciences and Humanities. Construction stage begins with the use of text mining processing, the weighting of each token by using the term frequency-inverse document frequency (TF-IDF), calculate the degree of similarity of each document by using the cosine similarity and classification using k-Nearest Neighbor.Evaluation is done by using the testing documents as much as 20 documents, with a value of k = {37, 41, 43}. Evaluation system shows the level of success in classifying documents on the value of k = 43 with a value precision of 0501. System test results showed that 20 document testing used can be classified according to the actual category

Machine Learning Verdict of EEG Signals in Brain Computer Interface

International Journal of Scientific Research in Computer Science Engineering and Information Technology ◽

10.32628/cseit1838114 ◽

2018 ◽

pp. 429-441

Author(s):

M. Jeyanthi ◽

C. Velayutham

Keyword(s):

Nearest Neighbor ◽

Technology Development ◽

Vital Role ◽

Svm Classifier ◽

K Nearest Neighbor ◽

Data Mining Technique ◽

Data Set ◽

Eeg Data ◽

Irrelevant Attributes

In Science and Technology Development BCI plays a vital role in the field of Research. Classification is a data mining technique used to predict group membership for data instances. Analyses of BCI data are challenging because feature extraction and classification of these data are more difficult as compared with those applied to raw data. In this paper, We extracted features using statistical Haralick features from the raw EEG data . Then the features are Normalized, Binning is used to improve the accuracy of the predictive models by reducing noise and eliminate some irrelevant attributes and then the classification is performed using different classification techniques such as Naïve Bayes, k-nearest neighbor classifier, SVM classifier using BCI dataset. Finally we propose the SVM classification algorithm for the BCI data set.

Song Recommendations Based on Artists with Cosine Similarity Algorithms and K-Nearest Neighbor

JELIKU (Jurnal Elektronik Ilmu Komputer Udayana) ◽

10.24843/jlk.2020.v08.i04.p01 ◽

2020 ◽

Vol 8 (4) ◽

pp. 367

Author(s):

Muhammad Arief Budiman ◽

Gst. Ayu Vida Mastrika Giri

Keyword(s):

Collaborative Filtering ◽

Mobile Phones ◽

Recommendation System ◽

Nearest Neighbor ◽

Cosine Similarity ◽

Music Recommendation ◽

K Nearest Neighbor ◽

Filtering Method ◽

K Nearest Neighbor Algorithm ◽

Music Recommendation System

The development of the music industry is currently growing rapidly, millions of music works continue to be issued by various music artists. As for the technologies also follows these developments, examples are mobile phones applications that have music subscription services, namely Spotify, Joox, GrooveShark, and others. Application-based services are increasingly in demand by users for streaming music, free or paid. In this paper, a music recommendation system is proposed, which the system itself can recommend songs based on the similarity of the artist that the user likes or has heard. This research uses Collaborative Filtering method with Cosine Similarity and K-Nearest Neighbor algorithm. From this research, a system that can recommend songs based on artists who are related to one another is generated.

An Efficient Incremental Nearest Neighbor Algorithm for Processing k-Nearest Neighbor Queries with Visal and Semantic Predicates in Multimedia Information Retrieval System

Information Retrieval Technology - Lecture Notes in Computer Science ◽

10.1007/11562382_63 ◽

2005 ◽

pp. 653-658 ◽

Cited By ~ 1

Author(s):

Dong-Ho Lee ◽

Dong-Joo Park

Keyword(s):

Information Retrieval ◽

Multimedia Information ◽

Retrieval System ◽

Nearest Neighbor ◽

Information Retrieval System ◽

Multimedia Information Retrieval ◽

K Nearest Neighbor ◽

Nearest Neighbor Algorithm ◽

Nearest Neighbor Queries

ASPECT BASED SENTIMENT ANALYSIS DATA KUESIONER DI RUMAH SAKIT MUHAMMADIYAH LAMONGAN MENGGUNAKAN ALGORITMA K-NN.

JOUTICA ◽

10.30736/jti.v6i2.677 ◽

2021 ◽

Vol 6 (2) ◽

pp. 506

Author(s):

Mustain Mustain Mustain

Keyword(s):

Vector Space ◽

Sentiment Analysis ◽

Nearest Neighbor ◽

Vector Space Model ◽

Analysis Data ◽

Cosine Similarity ◽

K Nearest Neighbor ◽

Space Model

Kesulitan untuk mengorganisir data kuesioner yang bersifat konvensional melatarbelakangi penelitian ini. Oleh karena itu dibuat sistem yang memudahkan pengelompokan data kuesioner secara otomatis yang lengkap dengan sentimen yang terkandung didalamnya. Dataset yang digunakan dalam penelitian ini adalah data kuesioner rumah sakit Muhammadiyah lamongan. Penelitian ini hanya menangani kuesioner yang berbentuk teks. Data dengan fisik kertas direkap kemudian diinput ke database lengkap dengan kategori unit kerja dan sentiment. Selanjutnya dataset tersebut di dilakukan pre-prosesing yang meliputi penanganan negasi case folding, tokenizing, filtering dan stemming. Sebagai data uji komentar dari kuesioner akan dilakukan pre-prosesing selanjutnya dihitung tingkat kemiripan document dengan menggunakan metode K- Nearest Neighbor dan Vector Space Model. Jumlah data yang ditangani mempengaruhi performa system terutama dari akurasi dan kecepatan pada saat proses klasifikasi. Hasil dari sistem yang dibuat berupa ranking dokumen yang paling mirip dengan dataset berdasarkan urutan nilai cosine similarity. Ujicoba klasifikasi berdasarkan kelas kategori menghasilkan nilai akurasi 91 %. Ujicoba berdasarkan Kelas Sentimen sebesar 94 %.dari kombinasi keduanya system berhasil mendapat akurasi sebesar 86 %

Approximate k-Nearest Neighbor Search Based on the Earth Mover's Distance for Efficient Content-based Information Retrieval

Proceedings of the 8th International Conference on Web Intelligence, Mining and Semantics - WIMS '18 ◽

10.1145/3227609.3227647 ◽

2018 ◽

Author(s):

Min-Hee Jang ◽

Sang-Wook Kim ◽

Woong-Kee Loh ◽

Jung-Im Won

Keyword(s):

Information Retrieval ◽

Nearest Neighbor ◽

Nearest Neighbor Search ◽

Earth Mover’S Distance ◽

K Nearest Neighbor ◽

Earth Mover's Distance ◽

Neighbor Search ◽

The Earth ◽

K Nearest Neighbor Search

Implementasi Algoritma K-Nearest Neighbor untuk Melakukan Klasifikasi Produk dari beberapa E-marketplace

Jurnal Teknik Informatika dan Sistem Informasi ◽

10.28932/jutisi.v5i1.1581 ◽

2019 ◽

Vol 5 (1) ◽

Author(s):

Danny Sebastian

Keyword(s):

Nearest Neighbor ◽

Cosine Similarity ◽

K Nearest Neighbor ◽

Nearest Neighbor Algorithm ◽

Product Data ◽

K Nearest Neighbor Algorithm ◽

Similarity Distance ◽

Brand Product

E-marketplace has gained popularity with the Indonesian society resulting in the increment of products offered. Consequently, customers require more effort to search for products. In this study, we classified products from several e-marketplaces. The classification was carried out using TF-IDF method for the weighting, cosine similarity to calculate product similarity distance, and k-nearest neighbor algorithm. Based on the first testing result using 150 product data, the k-nearest neighbor method with k=5 successfully classified 146 data with 4 data classified into the wrong class. This k=5 value gives the best result for this case, with an accuracy of 97.33%. The second testing result using 150 mixed brand product data, the k-nearest neighbor method successfully classified 145 data with 5 data classified into the wrong class. The accuracy of the second testing is 96.67%.

PUBLIC SENTIMENT ANALYSIS OF PASAR LAMA TANGERANG USING K-NEAREST NEIGHBOR METHOD AND PROGRAMMING LANGUAGE R

Jurnal Ilmiah Informatika Komputer ◽

10.35760/ik.2019.v24i2.2367 ◽

2019 ◽

Vol 24 (2) ◽

pp. 129-133

Author(s):

Hustinawaty ◽

Rama Al Azis Dwiputra ◽

Tavipia Rumambi

Keyword(s):

Sentiment Analysis ◽

Programming Languages ◽

Nearest Neighbor ◽

Tourist Attraction ◽

K Nearest Neighbor ◽

The Public ◽

Public Sentiment ◽

A Value ◽

Negative Comments ◽

The City

Pasar Lama Tangerang is a tourist attraction in the city of Tangerang. With the development of current technology, the public can provide an overview of how the facilities and services are provided by expressing opinions on the internet. However, it is difficult to distinguish which opinions belong to positive or negative opinions. Sentiment analysis is needed to overcome this problem. The stage in sentiment analysis starts with collecting data first, then the data is processed. Furthermore, the data that has been propagated is given a sentiment classification using the K-Nearest Neighbor (KNN) algorithm. Then the classification results obtained an accuracy of 83% with a value of k = 1 of 120 data divided by 92 positive and 28 negative comments. Sentiment analysis is made using the R and Rstudio programming languages as supporting software.

Implementation Of The K-Nearest Neighbor (KNN) Algorithm For Classification Of Obesity Levels

JELIKU (Jurnal Elektronik Ilmu Komputer Udayana) ◽

10.24843/jlk.2020.v09.i02.p15 ◽

2020 ◽

Vol 9 (2) ◽

pp. 277

Author(s):

Ayu Made Surya Indra Dewi ◽

Ida Bagus Gede Dwidasmara

Keyword(s):

Family History ◽

Nearest Neighbor ◽

K Nearest Neighbor ◽

Early Prevention ◽

Unhealthy Lifestyle ◽

Keywords Obesity ◽

A Value ◽

Test Parameters ◽

Knn Classification

Obesity or overweight is a health problem that can affect anyone. In research in several journals, it was found that obesity can be influenced by many factors, but the most dominant factors are lifestyle and diet. Obesity should not only be considered as a consequence of an unhealthy lifestyle, but obesity is a disease that can lead to other dangerous diseases. Therefore, it is important to know the level of obesity in order to take early prevention. To determine the level of obesity, a classification method is used, namely K-Nearest Neighbor (KNN) to classify the level of obesity. In this study, classification was carried out with 16 test parameters, namely Gender, Age, Height, Weight, Family History With Overweight, FAVC, FCVC, NCP, CAEC, Smoke, CH2O, SCC, FAF, TUE, CALC, Mtrans and 1 class attribute, namely Nobesity. From tests carried out using the KNN algorithm, the results obtained are 78.98% accuracy with a value of k = 2. Keywords: Obesity, KNN, Classification

Sentiment Analysis about Large-Scale Social Restrictions in Social Media Twitter Using Algoritm K-Nearest Neighbor

Jurnal Online Informatika ◽

10.15575/join.v6i1.670 ◽

2021 ◽

Vol 6 (1) ◽

pp. 96

Author(s):

Ikhsan Romli ◽

Shanti Prameswari R ◽

Antika Zahrotul Kamalia

Keyword(s):

Social Media ◽

Sentiment Analysis ◽

Large Scale ◽

Nearest Neighbor ◽

Cosine Similarity ◽

Manhattan Distance ◽

K Nearest Neighbor ◽

Distance Calculation ◽

K Nearest Neighbor Algorithm ◽

Similarity Distance

Sentiment analysis is a data processing to recognize topics that people talk about and their sentiments toward the topics, one of which in this study is about large-scale social restrictions (PSBB). This study aims to classify negative and positive sentiments by applying the K-Nearest Neighbor algorithm to see the accuracy value of 3 types of distance calculation which are cosine similarity, euclidean, and manhattan distance for Indonesian language tweets about large-scale social restrictions (PSBB) from social media twitter. With the results obtained, the K-Nearest Neighbor accuracy by the Cosine Similarity distance 82% at k = 3, K-Nearest Neighbor by the Euclidean Distance with an accuracy of 81% at k = 11 and K-Nearest Neighbor by Manhattan Distance with an accuracy 80% at k = 5, 7, 9, 11, and 13. So, in this study the K-Nearest Neighbor algorithm with the Cosine Similarity Distance calculation gets the highest point.