scholarly journals Hybrid Feature Selection Method Based on a Naïve Bayes Algorithm that Enhances the Learning Speed while Maintaining a Similar Error Rate in Cyber ISR

Compiler ◽  
2016 ◽  
Vol 5 (2) ◽  
Author(s):  
Siti Anisah ◽  
Anton Setiawan Honggowibowo ◽  
Asih Pujiastuti

A comic has its own characteristics compared the other types of books. The difference between comic and other books can be seen from the category o f period, material and physical. Comicand other booksneeded an application o f classification system. Looking for the problem, classification system was made using Chi Square Feature Selection and Naive Bayes algorithm to determine the comic based on the period, material and physical. Delphi programming language and Oracle Database are used to build the Classification System. Chi Square Feature Selection acquired trait a comic is in 0.10347 and which not comic is in 1.9531. Furthermore, data is classified by the Naive Bayes algorithm. From 120 titles o f comic that consists 60 titles o f comic and non comicused to build classesfor trainand 60 titles o f comic and non comic used to test. The results o f Naive Bayesalgorithm for comic is 96,67%with 3.33% error rate, and non comic is 90% with 10% error rate. The classification to determine comic is good.


Author(s):  
I Guna Adi Socrates ◽  
Afrizal Laksita Akbar ◽  
Mohammad Sonhaji Akbar ◽  
Agus Zainal Arifin ◽  
Darlis Herumurti

Naïve Bayes is one of data mining methods that are commonly used in text-based document classification. The advantage of this method is a simple algorithm with low computation complexity. However, there is weaknesses on Naïve Bayes methods where independence of Naïve Bayes features can’t be always implemented that would affect the accuracy of the calculation. Therefore, Naïve Bayes methods need to be optimized by assigning weights using Gain Ratio on its features. However, assigning weights on Naïve Bayes’s features cause problems in calculating the probability of each document which is caused by there are many features in the document that not represent the tested class. Therefore, the weighting Naïve Bayes is still not optimal. This paper proposes optimization of Naïve Bayes method using weighted by Gain Ratio and feature selection method in the case of text classification. Results of this study pointed-out that Naïve Bayes optimization using feature selection and weighting produces accuracy of 94%.


2019 ◽  
Vol 2 (2) ◽  
pp. 58 ◽  
Author(s):  
Utomo Pujianto ◽  
Asa Luki Setiawan ◽  
Harits Ar Rosyid ◽  
Ali M. Mohammad Salah

Diabetes is a metabolic disorder disease in which the pancreas does not produce enough insulin or the body cannot use insulin produced effectively. The HbA1c examination, which measures the average glucose level of patients during the last 2-3 months, has become an important step to determine the condition of diabetic patients. Knowledge of the patient's condition can help medical staff to predict the possibility of patient readmissions, namely the occurrence of a patient requiring hospitalization services back at the hospital. The ability to predict patient readmissions will ultimately help the hospital to calculate and manage the quality of patient care. This study compares the performance of the Naïve Bayes method and C4.5 Decision Tree in predicting readmissions of diabetic patients, especially patients who have undergone HbA1c examination. As part of this study we also compare the performance of the classification model from a number of scenarios involving a combination of preprocessing methods, namely Synthetic Minority Over-Sampling Technique (SMOTE) and Wrapper feature selection method, with both classification techniques. The scenario of C4.5 method combined with SMOTE and feature selection method produces the best performance in classifying readmissions of diabetic patients with an accuracy value of 82.74 %, precision value of 87.1 %, and recall value of 82.7 %.


2014 ◽  
Vol 2014 ◽  
pp. 1-10 ◽  
Author(s):  
Subhajit Dey Sarkar ◽  
Saptarsi Goswami ◽  
Aman Agarwal ◽  
Javed Aktar

With the proliferation of unstructured data, text classification or text categorization has found many applications in topic classification, sentiment analysis, authorship identification, spam detection, and so on. There are many classification algorithms available. Naïve Bayes remains one of the oldest and most popular classifiers. On one hand, implementation of naïve Bayes is simple and, on the other hand, this also requires fewer amounts of training data. From the literature review, it is found that naïve Bayes performs poorly compared to other classifiers in text classification. As a result, this makes the naïve Bayes classifier unusable in spite of the simplicity and intuitiveness of the model. In this paper, we propose a two-step feature selection method based on firstly a univariate feature selection and then feature clustering, where we use the univariate feature selection method to reduce the search space and then apply clustering to select relatively independent feature sets. We demonstrate the effectiveness of our method by a thorough evaluation and comparison over 13 datasets. The performance improvement thus achieved makes naïve Bayes comparable or superior to other classifiers. The proposed algorithm is shown to outperform other traditional methods like greedy search based wrapper or CFS.


2021 ◽  
Vol 5 (3) ◽  
pp. 799
Author(s):  
Fitria Septianingrum ◽  
Agung Susilo Yuda Irawan

In the era of the industrial revolution 4.0 as it is today, where the internet is a necessity for people to live their daily lives. The high intensity of internet use in the community, it causes the distribution of information in it to spread widely and quickly. The rapid distribution of information on the internet is also in line with the growing growth of digital data, so that the public opinions contained therein become important things. Because, from this digital data, it can be processed with sentiment analysis in order to obtain useful information about issues that are developing in the community or to find out public opinion on a company's product. The number of studies related to sentiment analysis that applies the Naive Bayes algorithm to solve the problem, so researchers are interested in conducting research on the use of feature selection for the algorithm. Therefore, this research was conducted to determine what feature selection is the most optimal when combined with the Naive Bayes algorithm using the Systematic Literature Review (SLR) research method. The results of this study concluded that the most optimal feature selection method when combined with the Naive Bayes algorithm is the Particle Swarm Optimization (PSO) method with an average accuracy value of 89.08%.


2020 ◽  
Vol 4 (3) ◽  
pp. 504-512
Author(s):  
Faried Zamachsari ◽  
Gabriel Vangeran Saragih ◽  
Susafa'ati ◽  
Windu Gata

The decision to move Indonesia's capital city to East Kalimantan received mixed responses on social media. When the poverty rate is still high and the country's finances are difficult to be a factor in disapproval of the relocation of the national capital. Twitter as one of the popular social media, is used by the public to express these opinions. How is the tendency of community responses related to the move of the National Capital and how to do public opinion sentiment analysis related to the move of the National Capital with Feature Selection Naive Bayes Algorithm and Support Vector Machine to get the highest accuracy value is the goal in this study. Sentiment analysis data will take from public opinion using Indonesian from Twitter social media tweets in a crawling manner. Search words used are #IbuKotaBaru and #PindahIbuKota. The stages of the research consisted of collecting data through social media Twitter, polarity, preprocessing consisting of the process of transform case, cleansing, tokenizing, filtering and stemming. The use of feature selection to increase the accuracy value will then enter the ratio that has been determined to be used by data testing and training. The next step is the comparison between the Support Vector Machine and Naive Bayes methods to determine which method is more accurate. In the data period above it was found 24.26% positive sentiment 75.74% negative sentiment related to the move of a new capital city. Accuracy results using Rapid Miner software, the best accuracy value of Naive Bayes with Feature Selection is at a ratio of 9:1 with an accuracy of 88.24% while the best accuracy results Support Vector Machine with Feature Selection is at a ratio of 5:5 with an accuracy of 78.77%.


2021 ◽  
Vol 5 (3) ◽  
pp. 527-533
Author(s):  
Yoga Religia ◽  
Amali Amali

The quality of an airline's services cannot be measured from the company's point of view, but must be seen from the point of view of customer satisfaction. Data mining techniques make it possible to predict airline customer satisfaction with a classification model. The Naïve Bayes algorithm has demonstrated outstanding classification accuracy, but currently independent assumptions are rarely discussed. Some literature suggests the use of attribute weighting to reduce independent assumptions, which can be done using particle swarm optimization (PSO) and genetic algorithm (GA) through feature selection. This study conducted a comparison of PSO and GA optimization on Naïve Bayes for the classification of Airline Passenger Satisfaction data taken from www.kaggle.com. After testing, the best performance is obtained from the model formed, namely the classification of Airline Passenger Satisfaction data using the Naïve Bayes algorithm with PSO optimization, where the accuracy value is 86.13%, the precision value is 87.90%, the recall value is 87.29%, and the value is AUC of 0.923.


Sign in / Sign up

Export Citation Format

Share Document