scholarly journals Performance comparison between naive bayes and k- nearest neighbor algorithm for the classification of Indonesian language articles

Author(s):  
Titin Winarti ◽  
Henny Indriyawati ◽  
Vensy Vydia ◽  
Febrian Wahyu Christanto

<span id="docs-internal-guid-210930a7-7fff-b7fb-428b-3176d3549972"><span>The match between the contents of the article and the article theme is the main factor whether or not an article is accepted. Many people are still confused to determine the theme of the article appropriate to the article they have. For that reason, we need a document classification algorithm that can group the articles automatically and accurately. Many classification algorithms can be used. The algorithm used in this study is naive bayes and the k-nearest neighbor algorithm is used as the baseline. The naive bayes algorithm was chosen because it can produce maximum accuracy with little training data. While the k-nearest neighbor algorithm was chosen because the algorithm is robust against data noise. The performance of the two algorithms will be compared, so it can be seen which algorithm is better in classifying documents. The comes about obtained show that the naive bayes algorithm has way better execution with an accuracy rate of 88%, while the k-nearest neighbor algorithm has a fairly low accuracy rate of 60%.</span></span>

2018 ◽  
Vol 14 (2) ◽  
pp. 261
Author(s):  
Lila Dini Utami

At this time the freedom to express opinions in oral and written forms about everything is very easy. This activity can be used to make decisions by some business people. Especially by service providers, such as hotels. This will be very useful in the development of the hotel business itself. But the review data must be processed using the right algorithm. So this study was conducted to find out which algorithms are more feasible to use to get the highest accuracy. The methods used are Naïve Bayes (NB), Support Vector Machine (SVM), and k-Nearest Neighbor (k-NN). From the process that has been done, the results of Naïve Bayes accuracy are 71.50% with the AUC value is 0.500, Support Vector Machine is 72.50% with the AUC value is 0.936 and the accuracy results if using the k-Nearest Neighbor algorithm is 75.00% with the AUC value is 0.500. The use of the k-Nearest Neighbor algorithm can help in making more appropriate decisions for hotel reviews at this time.


2021 ◽  
Vol 5 (2) ◽  
pp. 682
Author(s):  
Ahmad Marzuqi ◽  
Kusuma Ayu Laksitowening ◽  
Ibnu Asror

Accreditation is a form of assessment of the feasibility and quality of higher education. One of the accreditation assessment factors is the percentage of graduation on time. A low percentage of on-time graduations can affect the assessment of accreditation of study programs. Predicting student graduation can be a solution to this problem. The prediction results can show that students are at risk of not graduating on time. Temporal prediction allows students and study programs to do the necessary treatment early. Prediction of graduation can use the learning analytics method, using a combination of the naïve bayes and the k-nearest neighbor algorithm. The Naïve Bayes algorithm looks for the courses that most influence graduation. The k-nearest neighbor algorithm as a classification method with the attribute limit used is 40% of the total attributes so that the algorithm becomes more effective and efficient. The dataset used is four batches of Telkom University Informatics Engineering student data involving data index of course scores 1, level 2, level 3, and level 4 data. The results obtained from this study are 5 attributes that most influence student graduation. As well as the results of the presentation of the combination naïve bayes and k-nearest neighbor algorithm with the largest percentage yield at level 1 75.40%, level 2 82.08%, level 3 81.91%, and level 4 90.42%.


Author(s):  
Yessi Jusman ◽  
Widdya Rahmalina ◽  
Juni Zarman

Adolescence always searches for the identity to shape the personality character. This paper aims to use the artificial intelligent analysis to determine the talent of the adolescence. This study uses a sample of children aged 10-18 years with testing data consisting of 100 respondents. The algorithm used for analysis is the K-Nearest Neigbor and Naive Bayes algorithm. The analysis results are performance of accuracy results of both algorithms of classification. In knowing the accurate algorithm in determining children's interests and talents, it can be seen from the accuracy of the data with the confusion matrix using the RapidMiner software for training data, testing data, and combined training and testing data. This study concludes that the K-Nearest Neighbor algorithm is better than Naive Bayes in terms of classification accuracy.


2018 ◽  
Vol 5 (1) ◽  
pp. 18 ◽  
Author(s):  
Yofi Firdan Safri ◽  
Riza Arifudin ◽  
Much Aziz Muslim

Health is a human right and one of the elements of welfare that must be realized in the form of giving various health efforts to all the people of Indonesia. Poverty in Indonesia has become a national problem and even the government seeks efforts to alleviate poverty. For example, poor families have relatively low levels of livelihood and health. One of the new policies of the Sakti Government Card Program issued by the government includes three cards, namely Indonesia Smart Card (KIP), Healthy Indonesia Card (KIS) and Prosperous Family Card (KKS). In this study to determine the feasibility of a healthy Indonesian card (KIS) required a method of optimal accuracy. The data used in this study is KIS data which amounts to 200 data records with 15 determinants of feasibility in 2017 taken at the Social Service of Pekalongan Regency. The data were processed using the K-Nearest Neighbor algorithm and the combination of K-Nearest Neighbor-Naive Bayes Classifier algorithm. This can be seen from the accuracy of determining the feasibility of K-Nearest Neighbor algorithm of 64%, while the combination of K-Nearest Neighbor-Naive Bayes Classifier algorithm is 96%, so the combination of K-Nearest Neighbor-Naive Bayes Classifier algorithm is the optimal algorithm in determining the feasibility of healthy Indonesian card recipients with an increase of 32% accuracy. This study shows that the accuracy of the results of determining feasibility using a combination of K-Nearest Neighbor-Naive Bayes Classifier algorithms is better than the K-Nearest Neighbor algorithm.


Tech-E ◽  
2019 ◽  
Vol 2 (2) ◽  
pp. 30
Author(s):  
Budi Harto ◽  
Rino Rino

tumor or cancer is a disease that is a problem for people who are increasing every year. This disease in both the early and final stages requires attention because in this disease sufferers have a large risk of death. along with the rapid development of technology, we can use the technology to facilitate in all fields one of which is to predict success in a therapy. Data mining is one of the techniques used by the author in testing the dataset used in this study to get the best algorithm between Naïve Bayes and the K-Nearest Neighbor algorithm by using the Rapid Miner S tudio application and applying the best algorithm into the expected application or expert system. can help users predict the success of a therapy.


Author(s):  
Rajni Rajni ◽  
Amandeep Amandeep

<p>Diabetes is a major concern all over the world. It is increasing at a fast pace. People can avoid diabetes at an early stage without any test. The goal of this paper is to predict the probability of whether the person has a risk of diabetes or not at an early stage. This would lead to having a great impact on their quality of human life. The datasets are Pima Indians diabetes and Cleveland coronary illness and consist of 768 records. Though there are a number of solutions available for information extraction from a huge datasets and to predict the possibility of having diabetes, but the accuracy of their mining process is far from accurate. For achieving highest accuracy, the issue of zero probability which is generally faced by naïve bayes analysis needs to be addressed suitably. The proposed framework RB-Bayes aims to extract the required information with high accuracy that could survive the problem of zero probability and also configure accuracy with other methods like Support Vector Machine, Naive Bayes, and K Nearest Neighbor. We calculated mean to handle missing data and calculated probability for yes (positive) and no (negative). The highest value between yes and no decide the value for the tuple. It is mostly used in text classification. The outcomes on Pima Indian diabetes dataset demonstrate that the proposed methodology enhances the precision as a contrast with other regulated procedures. The accuracy of the proposed methodology large dataset is 72.9%.</p>


2020 ◽  
Vol 202 ◽  
pp. 16005
Author(s):  
Chashif Syadzali ◽  
Suryono Suryono ◽  
Jatmiko Endro Suseno

Customer behavior classification can be useful to assist companies in conducting business intelligence analysis. Data mining techniques can classify customer behavior using the K-Nearest Neighbor algorithm based on the customer's life cycle consisting of prospect, responder, active and former. Data used to classify include age, gender, number of donations, donation retention and number of user visits. The calculation results from 2,114 data in the classification of each customer’s category are namely active by 1.18%, prospect by 8.99%, responder by 4.26% and former by 85.57%. System accuracy using a range of K from K = 1 to K = 20 produces that the highest accuracy is 94.3731% at a value of K = 4. The results of the training data that produce a classification of user behavior can be used as a Business Intelligence analysis that is useful for companies in determining business strategies by knowing the target of optimal market.


2019 ◽  
Vol 16 (2) ◽  
pp. 187
Author(s):  
Mega Luna Suliztia ◽  
Achmad Fauzan

Classification is the process of grouping data based on observed variables to predict new data whose class is unknown. There are some classification methods, such as Naïve Bayes, K-Nearest Neighbor and Neural Network. Naïve Bayes classifies based on the probability value of the existing properties. K-Nearest Neighbor classifies based on the character of its nearest neighbor, where the number of neighbors=k, while Neural Network classifies based on human neural networks. This study will compare three classification methods for Seat Load Factor, which is the percentage of aircraft load, and also a measure in determining the profit of airline.. Affecting factors are the number of passengers, ticket prices, flight routes, and flight times. Based on the analysis with 47 data, it is known that the system of Naïve Bayes method has misclassifies in 14 data, so the accuracy rate is 70%. The system of K-Nearest Neighbor method with k=5 has misclassifies in 5 data, so the accuracy rate is 89%, and the Neural Network system has misclassifies in 10 data with accuracy rate 78%. The method with highest accuracy rate is the best method that will be used, which in this case is K-Nearest Neighbor method with success of classification system is 42 data, including 14 low, 10 medium, and 18 high value. Based on the best method, predictions can be made using new data, for example the new data consists of Bali flight routes (2), flight times in afternoon (2), estimate of passenger numbers is 140 people, and ticket prices is Rp.700,000. By using the K-Nearest Neighbor method, Seat Load Factor prediction is high or at intervals of 80% -100%.


SinkrOn ◽  
2020 ◽  
Vol 4 (2) ◽  
pp. 42
Author(s):  
Rizki Muliono ◽  
Juanda Hakim Lubis ◽  
Nurul Khairina

Higher education plays a major role in improving the quality of education in Indonesia. The BAN-PT institution established by the government has a standard of higher education accreditation and study program accreditation. With the 4.0-based accreditation instrument, it encourages university leaders to improve the quality and quality of their education. One indicator that determines the accreditation of study programs is the timely graduation of students. This study uses the K-Nearest Neighbor algorithm to predict student graduation times. Students' GPA at the time of the seventh semester will be used as training data, and data of students who graduate are used as sample data. K-Nearest Neighbor works in accordance with the given sample data. The results of prediction testing on 60 data for students of 2015-2016, obtained the highest level of accuracy of 98.5% can be achieved when k = 3. Prediction results depend on the pattern of data entered, the more samples and training data used, the calculation of the K-Nearest Neighbor algorithm is also more accurate.


2018 ◽  
Vol 1 (2) ◽  
pp. 38
Author(s):  
Nfn Herman

Online media journalists like tribunnews journalists usually determine the news category when make news input. Unfortunately, often the topic submitted is not in accordance with what is expected by the editor. These errors will make it difficult for news searches by customers. To eliminate these errors, editors can be assisted by an application that able to classify topics. Thus, editors is no longer too dependent on journalist input. This study aims to design applications that able to classify topics based on the texts contained in the news. The method used is the K-Nearest Neighboor algorithm. This design has produced a system that able to classify news topics automatically. To measure the accuracy of the application, several test were carried out by comparing between its results and the results of manual classification by the editor. The tests those carried out with several scenarios produce an accuracy rate of 82%


Sign in / Sign up

Export Citation Format

Share Document