Performance comparison between naive bayes and k- nearest neighbor algorithm for the classification of Indonesian language articles

The match between the contents of the article and the article theme is the main factor whether or not an article is accepted. Many people are still confused to determine the theme of the article appropriate to the article they have. For that reason, we need a document classification algorithm that can group the articles automatically and accurately. Many classification algorithms can be used. The algorithm used in this study is naive bayes and the k-nearest neighbor algorithm is used as the baseline. The naive bayes algorithm was chosen because it can produce maximum accuracy with little training data. While the k-nearest neighbor algorithm was chosen because the algorithm is robust against data noise. The performance of the two algorithms will be compared, so it can be seen which algorithm is better in classifying documents. The comes about obtained show that the naive bayes algorithm has way better execution with an accuracy rate of 88%, while the k-nearest neighbor algorithm has a fairly low accuracy rate of 60%.

Download Full-text

KOMPARASI ALGORITMA KLASIFIKASI PADA ANALISIS REVIEW HOTEL

Jurnal Pilar Nusa Mandiri ◽

10.33480/pilar.v14i2.1023 ◽

2018 ◽

Vol 14 (2) ◽

pp. 261

Author(s):

Lila Dini Utami

Keyword(s):

Support Vector Machine ◽

Nearest Neighbor ◽

Naive Bayes ◽

Service Providers ◽

Naïve Bayes ◽

Support Vector ◽

K Nearest Neighbor ◽

Nearest Neighbor Algorithm ◽

K Nearest Neighbor Algorithm ◽

Auc Value

At this time the freedom to express opinions in oral and written forms about everything is very easy. This activity can be used to make decisions by some business people. Especially by service providers, such as hotels. This will be very useful in the development of the hotel business itself. But the review data must be processed using the right algorithm. So this study was conducted to find out which algorithms are more feasible to use to get the highest accuracy. The methods used are Naïve Bayes (NB), Support Vector Machine (SVM), and k-Nearest Neighbor (k-NN). From the process that has been done, the results of Naïve Bayes accuracy are 71.50% with the AUC value is 0.500, Support Vector Machine is 72.50% with the AUC value is 0.936 and the accuracy results if using the k-Nearest Neighbor algorithm is 75.00% with the AUC value is 0.500. The use of the k-Nearest Neighbor algorithm can help in making more appropriate decisions for hotel reviews at this time.

Download Full-text

Temporal Prediction on Students’ Graduation using Naïve Bayes and K-Nearest Neighbor Algorithm

JURNAL MEDIA INFORMATIKA BUDIDARMA ◽

10.30865/mib.v5i2.2919 ◽

2021 ◽

Vol 5 (2) ◽

pp. 682

Author(s):

Ahmad Marzuqi ◽

Kusuma Ayu Laksitowening ◽

Ibnu Asror

Keyword(s):

Nearest Neighbor ◽

Naive Bayes ◽

Naïve Bayes ◽

K Nearest Neighbor ◽

Nearest Neighbor Algorithm ◽

Temporal Prediction ◽

Study Programs ◽

Level 3 ◽

Student Graduation ◽

K Nearest Neighbor Algorithm

Accreditation is a form of assessment of the feasibility and quality of higher education. One of the accreditation assessment factors is the percentage of graduation on time. A low percentage of on-time graduations can affect the assessment of accreditation of study programs. Predicting student graduation can be a solution to this problem. The prediction results can show that students are at risk of not graduating on time. Temporal prediction allows students and study programs to do the necessary treatment early. Prediction of graduation can use the learning analytics method, using a combination of the naïve bayes and the k-nearest neighbor algorithm. The Naïve Bayes algorithm looks for the courses that most influence graduation. The k-nearest neighbor algorithm as a classification method with the attribute limit used is 40% of the total attributes so that the algorithm becomes more effective and efficient. The dataset used is four batches of Telkom University Informatics Engineering student data involving data index of course scores 1, level 2, level 3, and level 4 data. The results obtained from this study are 5 attributes that most influence student graduation. As well as the results of the presentation of the combination naïve bayes and k-nearest neighbor algorithm with the largest percentage yield at level 1 75.40%, level 2 82.08%, level 3 81.91%, and level 4 90.42%.

Download Full-text

Comparison Analysis of K-Nearest Neighbor and Naïve Bayes in Determining Talent of Adolescence

International Journal of Artificial Intelligence Research ◽

10.29099/ijair.v4i1.118 ◽

2020 ◽

Vol 4 (1) ◽

Author(s):

Yessi Jusman ◽

Widdya Rahmalina ◽

Juni Zarman

Keyword(s):

Nearest Neighbor ◽

Naive Bayes ◽

Confusion Matrix ◽

Naïve Bayes ◽

Training Data ◽

K Nearest Neighbor ◽

Combined Training ◽

Testing Data ◽

Bayes Algorithm ◽

Children's Interests

Adolescence always searches for the identity to shape the personality character. This paper aims to use the artificial intelligent analysis to determine the talent of the adolescence. This study uses a sample of children aged 10-18 years with testing data consisting of 100 respondents. The algorithm used for analysis is the K-Nearest Neigbor and Naive Bayes algorithm. The analysis results are performance of accuracy results of both algorithms of classification. In knowing the accurate algorithm in determining children's interests and talents, it can be seen from the accuracy of the data with the confusion matrix using the RapidMiner software for training data, testing data, and combined training and testing data. This study concludes that the K-Nearest Neighbor algorithm is better than Naive Bayes in terms of classification accuracy.

Download Full-text

K-Nearest Neighbor and Naive Bayes Classifier Algorithm in Determining The Classification of Healthy Card Indonesia Giving to The Poor

Scientific Journal of Informatics ◽

10.15294/sji.v5i1.12057 ◽

2018 ◽

Vol 5 (1) ◽

pp. 18 ◽

Cited By ~ 2

Author(s):

Yofi Firdan Safri ◽

Riza Arifudin ◽

Much Aziz Muslim

Keyword(s):

Nearest Neighbor ◽

Naive Bayes ◽

Naïve Bayes ◽

Naive Bayes Classifier ◽

Bayes Classifier ◽

K Nearest Neighbor ◽

Naïve Bayes Classifier ◽

Nearest Neighbor Algorithm ◽

K Nearest Neighbor Algorithm ◽

The Government

Health is a human right and one of the elements of welfare that must be realized in the form of giving various health efforts to all the people of Indonesia. Poverty in Indonesia has become a national problem and even the government seeks efforts to alleviate poverty. For example, poor families have relatively low levels of livelihood and health. One of the new policies of the Sakti Government Card Program issued by the government includes three cards, namely Indonesia Smart Card (KIP), Healthy Indonesia Card (KIS) and Prosperous Family Card (KKS). In this study to determine the feasibility of a healthy Indonesian card (KIS) required a method of optimal accuracy. The data used in this study is KIS data which amounts to 200 data records with 15 determinants of feasibility in 2017 taken at the Social Service of Pekalongan Regency. The data were processed using the K-Nearest Neighbor algorithm and the combination of K-Nearest Neighbor-Naive Bayes Classifier algorithm. This can be seen from the accuracy of determining the feasibility of K-Nearest Neighbor algorithm of 64%, while the combination of K-Nearest Neighbor-Naive Bayes Classifier algorithm is 96%, so the combination of K-Nearest Neighbor-Naive Bayes Classifier algorithm is the optimal algorithm in determining the feasibility of healthy Indonesian card recipients with an increase of 32% accuracy. This study shows that the accuracy of the results of determining feasibility using a combination of K-Nearest Neighbor-Naive Bayes Classifier algorithms is better than the K-Nearest Neighbor algorithm.

Download Full-text

Comparison of Data Mining Methods Using the Naïve Bayes Algorithm and K-Nearest Neighbor in Predicting Immunotherapy Success

Tech-E ◽

10.31253/te.v2i2.139 ◽

2019 ◽

Vol 2 (2) ◽

pp. 30

Author(s):

Budi Harto ◽

Rino Rino

Keyword(s):

Data Mining ◽

Nearest Neighbor ◽

Naive Bayes ◽

Rapid Development ◽

Naïve Bayes ◽

K Nearest Neighbor ◽

Risk Of Death ◽

K Nearest Neighbor Algorithm ◽

Mining Methods ◽

Bayes Algorithm

tumor or cancer is a disease that is a problem for people who are increasing every year. This disease in both the early and final stages requires attention because in this disease sufferers have a large risk of death. along with the rapid development of technology, we can use the technology to facilitate in all fields one of which is to predict success in a therapy. Data mining is one of the techniques used by the author in testing the dataset used in this study to get the best algorithm between Naïve Bayes and the K-Nearest Neighbor algorithm by using the Rapid Miner S tudio application and applying the best algorithm into the expected application or expert system. can help users predict the success of a therapy.

Download Full-text

RB-Bayes algorithm for the prediction of diabetic in Pima Indian dataset

International Journal of Electrical and Computer Engineering (IJECE) ◽

10.11591/ijece.v9i6.pp4866-4872 ◽

2019 ◽

Vol 9 (6) ◽

pp. 4866

Author(s):

Rajni Rajni ◽

Amandeep Amandeep

Keyword(s):

Nearest Neighbor ◽

Naive Bayes ◽

Early Stage ◽

Human Life ◽

Naïve Bayes ◽

Support Vector ◽

Pima Indians ◽

K Nearest Neighbor ◽

Fast Pace ◽

Bayes Algorithm

Diabetes is a major concern all over the world. It is increasing at a fast pace. People can avoid diabetes at an early stage without any test. The goal of this paper is to predict the probability of whether the person has a risk of diabetes or not at an early stage. This would lead to having a great impact on their quality of human life. The datasets are Pima Indians diabetes and Cleveland coronary illness and consist of 768 records. Though there are a number of solutions available for information extraction from a huge datasets and to predict the possibility of having diabetes, but the accuracy of their mining process is far from accurate. For achieving highest accuracy, the issue of zero probability which is generally faced by naïve bayes analysis needs to be addressed suitably. The proposed framework RB-Bayes aims to extract the required information with high accuracy that could survive the problem of zero probability and also configure accuracy with other methods like Support Vector Machine, Naive Bayes, and K Nearest Neighbor. We calculated mean to handle missing data and calculated probability for yes (positive) and no (negative). The highest value between yes and no decide the value for the tuple. It is mostly used in text classification. The outcomes on Pima Indian diabetes dataset demonstrate that the proposed methodology enhances the precision as a contrast with other regulated procedures. The accuracy of the proposed methodology large dataset is 72.9%.

Download Full-text

Business Intelligence using the K-Nearest Neighbor Algorithm to Analyze Customer Behavior in Online Crowdfunding Systems

E3S Web of Conferences ◽

10.1051/e3sconf/202020216005 ◽

2020 ◽

Vol 202 ◽

pp. 16005

Author(s):

Chashif Syadzali ◽

Suryono Suryono ◽

Jatmiko Endro Suseno

Keyword(s):

Business Intelligence ◽

Nearest Neighbor ◽

Customer Behavior ◽

Training Data ◽

Business Strategies ◽

Intelligence Analysis ◽

K Nearest Neighbor ◽

Nearest Neighbor Algorithm ◽

K Nearest Neighbor Algorithm

Customer behavior classification can be useful to assist companies in conducting business intelligence analysis. Data mining techniques can classify customer behavior using the K-Nearest Neighbor algorithm based on the customer's life cycle consisting of prospect, responder, active and former. Data used to classify include age, gender, number of donations, donation retention and number of user visits. The calculation results from 2,114 data in the classification of each customer’s category are namely active by 1.18%, prospect by 8.99%, responder by 4.26% and former by 85.57%. System accuracy using a range of K from K = 1 to K = 20 produces that the highest accuracy is 94.3731% at a value of K = 4. The results of the training data that produce a classification of user behavior can be used as a Business Intelligence analysis that is useful for companies in determining business strategies by knowing the target of optimal market.

Download Full-text

COMPARING NAIVE BAYES, K-NEAREST NEIGHBOR, AND NEURAL NETWORK CLASSIFICATION METHODS OF SEAT LOAD FACTOR IN LOMBOK OUTBOUND FLIGHTS

Jurnal Matematika Statistika dan Komputasi ◽

10.20956/jmsk.v16i2.7864 ◽

2019 ◽

Vol 16 (2) ◽

pp. 187

Author(s):

Mega Luna Suliztia ◽

Achmad Fauzan

Keyword(s):

Neural Network ◽

Nearest Neighbor ◽

Naive Bayes ◽

Naïve Bayes ◽

Load Factor ◽

Classification Methods ◽

Affecting Factors ◽

K Nearest Neighbor ◽

Accuracy Rate ◽

Ticket Prices

Classification is the process of grouping data based on observed variables to predict new data whose class is unknown. There are some classification methods, such as Naïve Bayes, K-Nearest Neighbor and Neural Network. Naïve Bayes classifies based on the probability value of the existing properties. K-Nearest Neighbor classifies based on the character of its nearest neighbor, where the number of neighbors=k, while Neural Network classifies based on human neural networks. This study will compare three classification methods for Seat Load Factor, which is the percentage of aircraft load, and also a measure in determining the profit of airline.. Affecting factors are the number of passengers, ticket prices, flight routes, and flight times. Based on the analysis with 47 data, it is known that the system of Naïve Bayes method has misclassifies in 14 data, so the accuracy rate is 70%. The system of K-Nearest Neighbor method with k=5 has misclassifies in 5 data, so the accuracy rate is 89%, and the Neural Network system has misclassifies in 10 data with accuracy rate 78%. The method with highest accuracy rate is the best method that will be used, which in this case is K-Nearest Neighbor method with success of classification system is 42 data, including 14 low, 10 medium, and 18 high value. Based on the best method, predictions can be made using new data, for example the new data consists of Bali flight routes (2), flight times in afternoon (2), estimate of passenger numbers is 140 people, and ticket prices is Rp.700,000. By using the K-Nearest Neighbor method, Seat Load Factor prediction is high or at intervals of 80% -100%.

Download Full-text

Analysis K-Nearest Neighbor Algorithm for Improving Prediction Student Graduation Time

SinkrOn ◽

10.33395/sinkron.v4i2.10480 ◽

2020 ◽

Vol 4 (2) ◽

pp. 42

Author(s):

Rizki Muliono ◽

Juanda Hakim Lubis ◽

Nurul Khairina

Keyword(s):

Higher Education ◽

Nearest Neighbor ◽

Training Data ◽

K Nearest Neighbor ◽

Nearest Neighbor Algorithm ◽

Study Program ◽

Sample Data ◽

Student Graduation ◽

K Nearest Neighbor Algorithm

Higher education plays a major role in improving the quality of education in Indonesia. The BAN-PT institution established by the government has a standard of higher education accreditation and study program accreditation. With the 4.0-based accreditation instrument, it encourages university leaders to improve the quality and quality of their education. One indicator that determines the accreditation of study programs is the timely graduation of students. This study uses the K-Nearest Neighbor algorithm to predict student graduation times. Students' GPA at the time of the seventh semester will be used as training data, and data of students who graduate are used as sample data. K-Nearest Neighbor works in accordance with the given sample data. The results of prediction testing on 60 data for students of 2015-2016, obtained the highest level of accuracy of 98.5% can be achieved when k = 3. Prediction results depend on the pattern of data entered, the more samples and training data used, the calculation of the K-Nearest Neighbor algorithm is also more accurate.

Download Full-text

NEWS TOPIC CLASSIFICATION ON TRIBUNNEWS ONLINE MEDIA USING K-NEAREST NEIGHBOR ALGORITHM

Journal of Information Technology and Its Utilization ◽

10.30818/jitu.1.2.1879 ◽

2018 ◽

Vol 1 (2) ◽

pp. 38

Author(s):

Nfn Herman

Keyword(s):

Nearest Neighbor ◽

Online Media ◽

K Nearest Neighbor ◽

Accuracy Rate ◽

Nearest Neighbor Algorithm ◽

K Nearest Neighbor Algorithm ◽

Manual Classification

Online media journalists like tribunnews journalists usually determine the news category when make news input. Unfortunately, often the topic submitted is not in accordance with what is expected by the editor. These errors will make it difficult for news searches by customers. To eliminate these errors, editors can be assisted by an application that able to classify topics. Thus, editors is no longer too dependent on journalist input. This study aims to design applications that able to classify topics based on the texts contained in the news. The method used is the K-Nearest Neighboor algorithm. This design has produced a system that able to classify news topics automatically. To measure the accuracy of the application, several test were carried out by comparing between its results and the results of manual classification by the editor. The tests those carried out with several scenarios produce an accuracy rate of 82%

Download Full-text