scholarly journals Optimasi Algoritma Naïve Bayes Classifier untuk Mendeteksi Anomaly dengan Univariate Fitur Selection

2020 ◽  
Vol 4 (2) ◽  
pp. 40-49
Author(s):  
Harianto Harianto ◽  
◽  
Andi Sunyoto ◽  
Sudarmawan Sudarmawan ◽  
◽  
...  

System and network security from interference from parties who do not have access to the system is the most important in a system. To realize a system, data or network that is safe at unauthorized users or other interference, a system is needed to detect it. Intrusion-Detection System (IDS) is a method that can be used to detect suspicious activity in a system or network. The classification algorithm in artificial intelligence can be applied to this problem. There are many classification algorithms that can be used, one of which is Naïve Bayes. This study aims to optimize Naïve Bayes using Univariate Selection on the UNSW-NB 15 data set. The features used only take 40 features that have the best relevance. Then the data set is divided into two test data and training data, namely 10%: 90%, 20%: 70%, 30%: 70%, 40%: 60% and 50%: 50%. From the experiments carried out, it was found that feature selection had quite an effect on the accuracy value obtained. The highest accuracy value is obtained when the data set is divided into 40%: 60% for both feature selection and non-feature selection. Naïve Bayes with unselected features obtained the highest accuracy value of 91.43%, while with feature selection 91.62%, using feature selection could increase the accuracy value by 0.19%.

SinkrOn ◽  
2020 ◽  
Vol 5 (1) ◽  
Author(s):  
Miftahul Kahfi Al Fath ◽  
Arini Arini ◽  
Nasrul Hakiem

Sentiment analysis is an important and emerging research topic today. Sentiment analysis is done to see opinion or tendency of opinion to a problem or object by someone, whether it tends to have a negative or positive view. The main purpose of this study is to find out public sentiment on Full Day school's policy comment from Facebook Page of Kemendikbud RI and to find out the performance of the Naïve Bayes Classifier Algorithm. In this study, the authors used the Naïve Bayes Classifier algorithm with trigram and quad ram character feature selection with two different training data models and labeling of training data using Lexicon Based method in the classification of public sentiment toward the Full day school policy. The result of this research shows that public negative sentiment toward Full Day School policy is more than positive or neutral sentiment. The highest accuracy value is the Naïve Bayes Classifier algorithm with trigram feature selection of 300 data training models with a value of 80%. The greater of training data and feature selection used on the Naïve Bayes Classifier Algorithm affected the accurate result.


Kilat ◽  
2020 ◽  
Vol 9 (1) ◽  
pp. 103-114
Author(s):  
Arini - Arini ◽  
Luh Kesuma Wardhani ◽  
Dimas - Octaviano

Towards an election year (elections) in 2019 to come, many mass campaign conducted through social media networks one of them on twitter. One online campaign is very popular among the people of the current campaign with the hashtag #2019GantiPresiden. In studies sentiment analysis required hashtag 2019GantiPresiden classifier and the selection of robust functionality that mendaptkan high accuracy values. One of the classifier and feature selection algorithms are Naive Bayes classifier (NBC) with Tri-Gram feature selection Character & Term-Frequency which previous research has resulted in a fairly high accuracy. The purpose of this study was to determine the implementation of Algorithm Naive Bayes classifier (NBC) with each selection and compare features and get accurate results from Algorithm Naive Bayes classifier (NBC) with both the selection of the feature. The author uses the method of observation to collect data and do the simulation. By using the data of 1,000 tweets originating from hashtag # 2019GantiPresiden taken on 15 September 2018, the author divides into two categories: 950 tweets as training data and 50 tweets as test data where the labeling process using methods Lexicon Based sentiment. From this study showed Naïve Bayes classifier algorithm accuracy (NBC) with feature selection Character Tri-Gram by 76% and Term-Frequency by 74%,the result show that the feature selection Character Tri-Gram better than Term-Frequency.


2019 ◽  
Vol 2 (4) ◽  
pp. 135
Author(s):  
Saipul Anwar ◽  
Fajar Septian ◽  
Ristasari Dwi Septiana

Intrusion Detection System (IDS) is useful for detecting an attack or disturbance on a network or information system. Anomaly detection is a type of IDS that can detect a deviate attack on the network based on statistical probability. The increasing use of the internet also increases interference or attacks from intruders or crackers that exploit weak internet protocols and application software. When many data packets arrive, a problem arises that needs to be analyzed. The right technique to analyze the data package is data mining. This study aims to classify IDS anomalies using the Naïve Bayes classification algorithm from the results of attribute selection with correlation-based feature selection. This study uses a UNSW-NB15 intrusion detection system data collection consisting of 49 attributes and 321,283 data records. Performance measurements are based on accuracy, precision, F-Measure and ROC Area. The results of attribute selection with correlation-based feature selection leave 4 attributes. The results of the evaluation of IDS anomaly classification using the naïve Bayes algorithm without the precedence of the attributes selected by the correlation technique obtained an accuracy rate of 71.2%. While the classification results if preceded by the attributes selected by the correlation technique obtained an accuracy of 74.8%. Classification with the naïve Bayes algorithm can be improved its accuracy which is preceded by the selection of attributes with correlation techniques.


MATICS ◽  
2017 ◽  
Vol 9 (2) ◽  
pp. 53 ◽  
Author(s):  
Aris Diantoro ◽  
Irwan Budi Santoso

<strong><em>Losses in chicken eggs hatchery make breeders income declined. The main cause of these things because it is less effective and efficient in distinguishing the state of fertilities in the eggs. The detection of fertile and infertile eggs will automatically provide ease of selection and removal of the eggs are fertile and infertile eggs. This will bring more profits for breeder as well as time efficiency more and selling power. Infertile eggs will give breeders the sale price if it is known as early as possible in order not to fail hatching. A method fuzzy c means and naive bayes classifier is designed to identify the state of the fertility of eggs. By putting eggs near the source light and black background in a dark room, then taked of image with a high qualities camera. From the resulting camera image, then extracted features or take characteristics that distinguish between fertile and infertile eggs. The total amount of data used in this study of 450 eggs image sourced from the field survey. Training data is used   250 data, 125 fertile eggs image data and 125 infertile eggs image data. As for testing the data using the 200 data, the image data 150 fertile eggs and 50 infertile eggs image data. Based on trial results of training data is obtained the best accuracy is equal to 80% at intervals of 5, 86.4% at intervals of 5 and dimensions 70x60, and 99.6% on 1x2 resize. The accuracy of the results obtained by 78%, 82% and 94% in trials testing data.</em></strong>


2020 ◽  
Vol 17 (1) ◽  
pp. 37-42
Author(s):  
Yuris Alkhalifi ◽  
Ainun Zumarniansyah ◽  
Rian Ardianto ◽  
Nila Hardi ◽  
Annisa Elfina Augustia

Non-Cash Food Assistance or Bantuan Pangan Non-Tunai (BPNT) is food assistance from the government given to the Beneficiary Family (KPM) every month through an electronic account mechanism that is used only to buy food at the Electronic Shop Mutual Assistance Joint Business Group Hope Family Program (e-Warong KUBE PKH ) or food traders working with Bank Himbara. In its distribution, BPNT still has problems that occur that are experienced by the village apparatus especially the apparatus of Desa Wanasari on making decisions, which ones are worthy of receiving (poor) and not worthy of receiving (not poor). So one way that helps in making decisions can be done through the concept of data mining. In this study, a comparison of 2 algorithms will be carried out namely Naive Bayes Classifier and Decision Tree C.45. The total sample used is as much as 200 head of household data which will then be divided into 2 parts into validation techniques is 90% training data and 10% test data of the total sample used then the proposed model is made in the RapidMiner application and then evaluated using the Confusion Matrix table to find out the highest level of accuracy from 2 of these methods. The results in this classification indicate that the level of accuracy in the Naive Bayes Classifier method is 98.89% and the accuracy level in the Decision Tree C.45 method is 95.00%. Then the conclusion that in this study the algorithm with the highest level of accuracy is the Naive Bayes Classifier algorithm method with a difference in the accuracy rate of 3.89%.


2017 ◽  
Vol 5 (8) ◽  
pp. 260-266
Author(s):  
Subhankar Manna ◽  
Malathi G.

Healthcare industry collects huge amount of unclassified data every day.  For an effective diagnosis and decision making, we need to discover hidden data patterns. An instance of such dataset is associated with a group of metabolic diseases that vary greatly in their range of attributes. The objective of this paper is to classify the diabetic dataset using classification techniques like Naive Bayes, ID3 and k means classification. The secondary objective is to study the performance of various classification algorithms used in this work. We propose to implement the classification algorithm using R package. This work used the dataset that is imported from the UCI Machine Learning Repository, Diabetes 130-US hospitals for years 1999-2008 Data Set. Motivation/Background: Naïve Bayes is a probabilistic classifier based on Bayes theorem. It provides useful perception for understanding many algorithms. In this paper when Bayesian algorithm applied on diabetes dataset, it shows high accuracy. Is assumes variables are independent of each other. In this paper, we construct a decision tree from diabetes dataset in which it selects attributes at each other node of the tree like graph and model, each branch represents an outcome of the test, and each node hold a class attribute. This technique separates observation into branches to construct tree. In this technique tree is split in a recursive way called recursive partitioning. Decision tree is widely used in various areas because it is good enough for dataset distribution. For example, by using ID3 (Decision tree) algorithm we get a result like they are belong to diabetes or not. Method: We will use Naïve Bayes for probabilistic classification and ID3 for decision tree.  Results: The dataset is related to Diabetes dataset. There are 18 columns like – Races, Gender, Take_metformin, Take_repaglinide, Insulin, Body_mass_index, Self_reported_health etc. and 623 rows. Naive Bayes Classifier algorithm will be used for getting the probability of having diabetes or not. Here Diabetes is the class for Diabetes data set. There are two conditions “Yes” and “No” and have some personal information about the patient like - Races, Gender, Take_metformin, Take_repaglinide, Insulin, Body_mass_index, Self_reported_health etc. We will see the probability that for “Yes” what unit of probability and for “No” what unit of probability which is given bellow. For Example: Gender – Female have 0.4964 for “No” and 0.5581 for “Yes” and for Male 0.5035 is for “No” and 0.4418 for “Yes”. Conclusions: In this paper two algorithms had been implemented Naive Bayes Classifier algorithm and ID3 algorithm. From Naive Bayes Classifier algorithm, the probability of having diabetes has been predicted and from ID3 algorithm a decision tree has been generated.


Sign in / Sign up

Export Citation Format

Share Document