scholarly journals An Intrusion Detection Model based on Hybrid Classification algorithm

2018 ◽  
Vol 246 ◽  
pp. 03027
Author(s):  
Manfu Ma ◽  
Wei Deng ◽  
Hongtong Liu ◽  
Xinmiao Yun

Due to using the single classification algorithm can not meet the performance requirements of intrusion detection, combined with the numerical value of KNN and the advantage of naive Bayes in the structure of data, an intrusion detection model KNN-NB based on KNN and Naive Bayes hybrid classification algorithm is proposed. The model first preprocesses the NSL-KDD intrusion detection data set. And then by exploiting the advantages of KNN algorithm in data values, the model calculates the distance between the samples according to the feature items and selects the K sample data with the smallest distance. Finally, by naive Bayes to get the final result. The experimental results on the NSL-KDD dataset show that the KNN-NB algorithm can meet the requirement of balanced performance than the traditional KNN and Naive Bayes algorithm in term of accuracy, sensitivity, false detection rate, specificity, and missed detection rate.

2017 ◽  
Vol 5 (8) ◽  
pp. 260-266
Author(s):  
Subhankar Manna ◽  
Malathi G.

Healthcare industry collects huge amount of unclassified data every day.  For an effective diagnosis and decision making, we need to discover hidden data patterns. An instance of such dataset is associated with a group of metabolic diseases that vary greatly in their range of attributes. The objective of this paper is to classify the diabetic dataset using classification techniques like Naive Bayes, ID3 and k means classification. The secondary objective is to study the performance of various classification algorithms used in this work. We propose to implement the classification algorithm using R package. This work used the dataset that is imported from the UCI Machine Learning Repository, Diabetes 130-US hospitals for years 1999-2008 Data Set. Motivation/Background: Naïve Bayes is a probabilistic classifier based on Bayes theorem. It provides useful perception for understanding many algorithms. In this paper when Bayesian algorithm applied on diabetes dataset, it shows high accuracy. Is assumes variables are independent of each other. In this paper, we construct a decision tree from diabetes dataset in which it selects attributes at each other node of the tree like graph and model, each branch represents an outcome of the test, and each node hold a class attribute. This technique separates observation into branches to construct tree. In this technique tree is split in a recursive way called recursive partitioning. Decision tree is widely used in various areas because it is good enough for dataset distribution. For example, by using ID3 (Decision tree) algorithm we get a result like they are belong to diabetes or not. Method: We will use Naïve Bayes for probabilistic classification and ID3 for decision tree.  Results: The dataset is related to Diabetes dataset. There are 18 columns like – Races, Gender, Take_metformin, Take_repaglinide, Insulin, Body_mass_index, Self_reported_health etc. and 623 rows. Naive Bayes Classifier algorithm will be used for getting the probability of having diabetes or not. Here Diabetes is the class for Diabetes data set. There are two conditions “Yes” and “No” and have some personal information about the patient like - Races, Gender, Take_metformin, Take_repaglinide, Insulin, Body_mass_index, Self_reported_health etc. We will see the probability that for “Yes” what unit of probability and for “No” what unit of probability which is given bellow. For Example: Gender – Female have 0.4964 for “No” and 0.5581 for “Yes” and for Male 0.5035 is for “No” and 0.4418 for “Yes”. Conclusions: In this paper two algorithms had been implemented Naive Bayes Classifier algorithm and ID3 algorithm. From Naive Bayes Classifier algorithm, the probability of having diabetes has been predicted and from ID3 algorithm a decision tree has been generated.


Tech-E ◽  
2021 ◽  
Vol 4 (2) ◽  
pp. 44
Author(s):  
Rino Rino

Heart disease is a condition of the presence of fatty deposits in the coronary arteries in the heart which changes the role and shape of the arteries so that blood flow to the heart is obstructed. Data mining methods can predict this disease, some of the methods are C4.5 Algorithm and Naive Bayes which are often used in research.The data set in this research was obtained from the uci machine learning repository site, where the dataset has 3546 records and 13 attributes.The accuracy value of the Naïve Bayes algorithm has a high value of 81.40% compared to the C4.5 algorithm which only has an accuracy value of 79.07%. Based on the calculation results, it can be concluded that the Naïve Bayes Algorithm is a very good clarification because it has a value between 0.709 - 1.00.From conclusion above, the Naïve Bayes algorithm has a higher accuracy value than the C4.5 algorithm so the researchers decided to use the Naïve Bayes algorithm in predicting heart disease.


Author(s):  
Sachin Sabloak ◽  
Jasuandi Wijaya ◽  
Abdul Rahman ◽  
Molavi Arman

[Id]Pentingnya jaringan komputer pada kehidupan sekarang, perlu adanya kestabilan jaringan komputer yang digunakan. Pemantauan kualitas jaringan internet didalam sebuah jaringan LAN dilakukan network administrator untuk mendapatkan nilai dari data yang didapat, penelitian ini menerapkan algoritma Naive Bayes menggunakan dataset TIPHON dengan parameter yang terdapat dalam metode QoS yaitu delay, packetloss dan jitter untuk memonitor kualitas jaringan internet. Metode QoS akan menghasilkan nilai dari setiap parameter yang dibutuhkan untuk pemantauan jaringan, guna mendapatkan kesimpulan mengenai status jaringan internet digunakan Algoritma Naive Bayes. Metode Quality of Service (QoS) merupakan sebuah metode yang digunakan dalam mendefinisikan kemampuan suatu jaringan yang ?digunakan untuk pengukuran tentang kualitas ?jaringan. Penggunaan algoritma Naive Bayes diperlukan karena algoritma tersebut digunakan dalam pengklasifikasian yang menggunakan probabilitas dan statistik serta mampu mengambil keputusan dengan menggunakan dataset yang telah disediakan. Tujuan penelitian ini dilakukan untuk mengetahui status jaringan internet di lab komputer STMIK Global Informatika MDP serta mengetahui tingkat akurasi dari algoritma Naive Bayes untuk mengklasifikasikan status jaringan internet. Pengujian penelitian dilakukan di lab komputer STMIK Global Informatika MDP. Hasil pengujian dalam penelitian ini menunjukkan bahwa akurasi Naive Bayes yang didapatkan sebesar 87,78% dan status jaringan internet di lab komputer STMIK Global Informatika MDP masuk ke dalam kategori memuaskan dengan nilai dominan yaitu sebesar 47,78%.Kata Kunci: Naive Bayes, network administrator, Quality of Service (QoS), status jaringan internet.[En]Since computer network is very important nowadays, it needs the stability of the network used. Monitoring the quality of the internet network in LAN is conducted by an administrator to get the value of the data obtained. This research applied Naive Bayes algorithm using TIPHON data set with parameters in QoS method; delay, packetloss and jitter, to monitor the quality of the internet network. QoS method will gain value in every parameter needed for network monitoring. To get a conclusion about the status of the internet network, Naive Bayes algorithm was used. Quality of Service (QoS) method is a method used to define the ability of a network to measure its quality. Naive Bayes algorithm is needed since the algorithm is used in classifying using probability and statistic as well as making decision using dataset provided. This research is conducted to see the status of the internet network in STMIK Global Informatika MDP computer laboratory and to know the level of accuracy of Naive Bayes algorithm to classify the status of the network. The research was conducted in STMIK Global Informatika MDP computer laboratory. The result of the research showed that the accuracy of Naive Bayes was 87,78% and the status of the internet network STMIK Global Informatika MDP was in the category of satisfactory with dominant value 47,78%.


2019 ◽  
Vol 17 (2) ◽  
pp. 215-224
Author(s):  
Mohammed Tabash ◽  
Mohamed Abd Allah ◽  
Bella Tawfik

The increase of security threats and hacking the computer networks are one of the most dangerous issues should treat in these days. Intrusion Detection Systems (IDSs), are the most appropriate methods to prevent and detect the attacks of networks and computer systems. This study presents several techniques to discover network anomalies using data mining tasks, Machine learning technology and dependence of artificial intelligence techniques. In this research, the smart hybrid model was developed to explore any penetrations inside the network. The model divides into two basic stages. The first stage includes the Genetic Algorithm (GA) in selecting the characteristics with depends on a process of extracting, Discretize And dimensionality reduction through Proportional K-Interval Discretization (PKID) and Fisher Linear Discriminant Analysis (FLDA) on respectively. At the end of the first stage combining Naïve Bayes classifier (NB) and Decision Table (DT) using NSL-KDD data set divided into two separate groups for training and testing. The second stage completely depends on the first stage outputs (predicted class) and reclassified with multilayer perceptrons using Deep Learning4J (DL) and the use of algorithm Stochastic Gradient Descent (SGD). In order to improve the performance in terms of the accuracy in classification of penetrations, raising the average of discovering and reducing the false alarms. The comparison of the proposed model and conventional models show the superiority of the proposed model and the previous conventional hybrid models. The result of the proposed model is 99.9325 of classification accuracy, the rate of detection is 99.9738 and 0.00093 of false alarms


Author(s):  
Zena Abdulmunim Aziz ◽  
◽  
Adnan Mohsin Abdulazeez ◽  

The rapid development of technology reveals several safety concerns for making life more straightforward. The advance of the Internet over the years has increased the number of attacks on the Internet. The IDS is one supporting layer for data protection. Intrusion Detection Systems (IDS) offer a healthy market climate and prevent misgivings in the network. Recently, IDS has been used to recognize and distinguish safety risks using Machine Learning (ML). This paper proposed a comparative analysis of the different ML algorithms used in IDS and aimed to identify intrusions with SVM, J48, and Naive Bayes. Intrusion is also classified. Work with the KDD-CUP data set, and their performance has been checked with the WEKA software. A comparison of techniques such as J48, SVM, and Naïve Bayes showed that the accuracy of j48 is the higher one which was (99.96%).


2019 ◽  
Vol 8 (2S11) ◽  
pp. 2684-2687 ◽  

The Web is one of the richest sources for gathering of consumer reviews and opinions. There are many websites which contains opinions of the customers in the form of reviews, blogs, discussion groups, and forums. This project focuses on customer reviews on the restaurants. It predicts whether the given comment is either a positive or negative using supervised machine learning techniques. The project makes use of a dataset from Kaggle website. The dataset consists of comment and the type of comment (i.e., either positive or negative). This project makes a study on classification algorithm and text mining approaches to identify the type of comment. Firstly, the data set which is taken is made free from duplicates. That is duplicates are removed then it is followed by text pre-processing that involves removal of punctuation marks, stop word removal and then conversion of the whole text into vector format would takes place. The conversion from text to vector is an essential step because the English cannot be directly used for the analysis as we are working with linear algebra. So, as to work with this data, it has to be converted to vector format and we are using CountVectorizer to convert the data to the vector format. And finally comes the classification part. We are using Naive Bayes algorithm for this classification. This classification makes the data set into two parts as mentioned above. Here we are taking 70 percent of the data to be train data set and 30 percent of the data to be test data set


Author(s):  
Sandipan Roy ◽  
Apurbo Mandal ◽  
Debraj Dey

Going digital involves networking with so many connected devices, so network security becomes a critical task for everyone. But an intrusion detection system can help us to detect malicious activity in a system or network. But generally, intrusion detection systems (IDS) are not reliable and sustainable also they require more resources. In recent years so many machine learning methods are proposed to give higher accuracy with minimal false alerts. But analyzing those huge traffic data is still challenging. So, in this article, we proposed a technique using the Support Vector Machine & Naive Bayes algorithm, by using this we can solve the classification problem of the intrusion detection system. For evaluating our proposed method, we use NSL-KDD and UNSW-NB15 dataset. And after getting the result we see that the SVM works better than the Naive Bayes algorithm on that dataset.


Sign in / Sign up

Export Citation Format

Share Document