Influence of pre-processing on anomaly-based intrusion detection

Introduction/purpose: The anomaly-based intrusion detection system detects intrusions based on a reference model which identifies the normal behavior of a computer network and flags an anomaly. Machine-learning models classify intrusions or misuse as either normal or anomaly. In complex computer networks, the number of training records is large, which makes the evaluation of the classifiers computationally expensive. Methods: A feature selection algorithm that reduces the dataset size is presented in this paper. Results: The experiments are conducted on the Kyoto 2006+ dataset and four classifier models: feedforward neural network, k-nearest neighbor, weighted k-nearest neighbor, and medium decision tree. The results show high accuracy of the models, as well as low false positive and false negative rates. Conclusion: The three-step pre-processing algorithm for feature selection and instance normalization resulted in improving performances of four binary classifiers and in decreasing processing time.

Download Full-text

Anomaly-Based Intrusion Detection: Feature Selection and Normalization Influence to the Machine Learning Models Accuracy

European Journal of Engineering and Formal Sciences ◽

10.26417/ejef.v2i3.p101-106 ◽

2018 ◽

Vol 2 (3) ◽

pp. 101

Author(s):

Danijela Protić ◽

Miomir Stanković

Keyword(s):

Feature Selection ◽

Intrusion Detection ◽

Decision Tree ◽

Network Traffic ◽

Intrusion Detection System ◽

Nearest Neighbor ◽

Reference Model ◽

Detection System ◽

Support Vector ◽

K Nearest Neighbor

Anomaly-based intrusion detection system detects intrusion to the computer network based on a reference model that has to be able to identify its normal behavior and flag what is not normal. In this process network traffic is classified into two groups by adding different labels to normal and malicious behavior. Main disadvantage of anomaly-based intrusion detection system is necessity to learn the difference between normal and not normal. Another disadvantage is the complexity of datasets which simulate realistic network traffic. Feature selection and normalization can be used to reduce data complexity and decrease processing runtime by selecting a better feature space This paper presents the results of testing the influence of feature selection and instances normalization to the classification performances of k-nearest neighbor, weighted k-nearest neighbor, support vector machines and decision tree models on 10 days records of the Kyoto 2006+ dataset. The data was pre-processed to remove all categorical features from the dataset. The resulting subset contained 17 features. Features containing instances which could not be normalized into the range [-1, 1] have also been removed. The resulting subset consisted of nine features. The feature ‘Label’ categorized network traffic to two classes: normal (1) and malicious (0). The performance metric to evaluate models was accuracy. Proposed method resulted in very high accuracy values with Decision Tree giving highest values for not-normalized and with k-nearest neighbor giving highest values for normalized data.Keywords: feature selection, normalization, k-NN, weighted k-NN, SVM, decision tree, Kyoto 2006+

Download Full-text

A novel ensemble modeling for intrusion detection system

International Journal of Electrical and Computer Engineering (IJECE) ◽

10.11591/ijece.v10i2.pp1963-1971 ◽

2020 ◽

Vol 10 (2) ◽

pp. 1963

Author(s):

Pullagura Indira Priyadarsini ◽

G. Anuradha

Keyword(s):

Feature Selection ◽

Intrusion Detection ◽

Intrusion Detection System ◽

Nearest Neighbor ◽

Detection System ◽

Distance Functions ◽

Classification Model ◽

Support Vector ◽

K Nearest Neighbor ◽

Data Set

Vast increase in data through internet services has made computer systems more vulnerable and difficult to protect from malicious attacks. Intrusion detection systems (IDSs) must be more potent in monitoring intrusions. Therefore an effectual Intrusion Detection system architecture is built which employs a facile classification model and generates low false alarm rates and high accuracy. Noticeably, IDS endure enormous amounts of data traffic that contain redundant and irrelevant features, which affect the performance of the IDS negatively. Despite good feature selection approaches leads to a reduction of unrelated and redundant features and attain better classification accuracy in IDS. This paper proposes a novel ensemble model for IDS based on two algorithms Fuzzy Ensemble Feature selection (FEFS) and Fusion of Multiple Classifier (FMC). FEFS is a unification of five feature scores. These scores are obtained by using feature-class distance functions. Aggregation is done using fuzzy union operation. On the other hand, the FMC is the fusion of three classifiers. It works based on Ensemble decisive function. Experiments were made on KDD cup 99 data set have shown that our proposed system works superior to well-known methods such as Support Vector Machines (SVMs), K-Nearest Neighbor (KNN) and Artificial Neural Networks (ANNs). Our examinations ensured clearly the prominence of using ensemble methodology for modeling IDSs. And hence our system is robust and efficient.

Download Full-text

Intrusion Detection System Berbasis Seleksi Fitur Dengan Kombinasi Filter Information Gain Ratio Dan Correlation

Jurnal Teknologi Informasi dan Ilmu Komputer ◽

10.25126/jtiik.0813154 ◽

2021 ◽

Vol 8 (3) ◽

pp. 457

Author(s):

Nitami Lestari Putri ◽

Radityo Adi Nugroho ◽

Rudy Herteno

Keyword(s):

Feature Selection ◽

Intrusion Detection ◽

Intrusion Detection System ◽

Nearest Neighbor ◽

Information Gain ◽

Detection System ◽

Feature Ranking ◽

K Nearest Neighbor ◽

Gain Ratio ◽

Information Gain Ratio

Intrusion Detection System merupakan suatu sistem yang dikembangkan untuk memantau dan memfilter aktivitas jaringan dengan mengidentifikasi serangan. Karena jumlah data yang perlu diperiksa oleh IDS sangat besar dan banyaknya fitur-fitur asing yang dapat membuat proses analisis menjadi sulit untuk mendeteksi pola perilaku yang mencurigakan, maka IDS perlu mengurangi jumlah data yang akan diproses dengan cara mengurangi fitur yang dapat dilakukan dengan seleksi fitur. Pada penelitian ini mengkombinasikan dua metode perangkingan fitur yaitu Information Gain Ratio dan Correlation dan mengklasifikasikannya menggunakan algoritma K-Nearest Neighbor. Hasil perankingan dari kedua metode dibagi menjadi dua kelompok. Pada kelompok pertama dicari nilai mediannya dan untuk kelompok kedua dihapus. Lalu dilakukan klasifikasi K-Nearest Neighbor dengan menggunakan 10 kali validasi silang dan dilakukan pengujian dengan nilai k=5. Penerapan pemodelan yang diusulkan menghasilkan akurasi tertinggi sebesar 99.61%. Sedangkan untuk akurasi tanpa seleksi fitur menghasilkan akurasi tertinggi sebesar 99.59%. AbstractIntrusion Detection System is a system that was developed for monitoring and filtering activity in network with identified of attack. Because of the amount of the data that need to be checked by IDS is very large and many foreign feature that can make the analysis process difficult for detection suspicious pattern of behavior, so that IDS need for reduce amount of the data to be processed by reducing features that can be done by feature selection. In this study, combines two methods of feature ranking is Information Gain Ratio and Correlation and classify it using K-Nearest Neighbor algorithm. The result of feature ranking from the both methods divided into two groups. in the first group searched for the median value and in the second group is removed. Then do the classification of K-Nearest Neighbor using 10 fold cross validation and do the tests with values k=5. The result of the proposed modelling produce the highest accuracy of 99.61%. While the highest accuracy value of the not using the feature selection is 99.59%.

Download Full-text

Feature Selection Based on Cross-Correlation for the Intrusion Detection System

Security and Communication Networks ◽

10.1155/2020/8875404 ◽

2020 ◽

Vol 2020 ◽

pp. 1-17

Author(s):

Gholamreza Farahani

Keyword(s):

Feature Selection ◽

Intrusion Detection ◽

Computer Networks ◽

Cross Correlation ◽

Nearest Neighbor ◽

Detection System ◽

Support Vector ◽

K Nearest Neighbor ◽

Detection Systems ◽

Correlation Based Feature Selection

One of the important issues in the computer networks is security. Therefore, trusted communication of information in computer networks is a critical point. To have a safe communication, it is necessary that, in addition to the prevention mechanisms, intrusion detection systems (IDSs) are used. There are various approaches to utilize intrusion detection, but any of these systems is not complete. In this paper, a new cross-correlation-based feature selection (CCFS) method is proposed and compared with the cuttlefish algorithm (CFA) and mutual information-based feature selection (MIFS) features with use of four different classifiers: support vector machine (SVM), naive Bayes (NB), decision tree (DT), and K-nearest neighbor (KNN). The experimental results on the KDD Cup 99, NSL-KDD, AWID, and CIC-IDS2017 datasets show that the proposed method has a better performance in accuracy, precision, recall, and F1-score criteria in comparison with the other two methods in different classifiers. Also, the results on different classifiers show that the usage of the DT classifier for the proposed method is the best.

Download Full-text

Detection of Anomalies in the Computer Network Behaviour

European Journal of Engineering and Formal Sciences ◽

10.26417/ejef.v4i1.p7-13 ◽

2020 ◽

Vol 4 (1) ◽

pp. 7

Author(s):

Danijela Protić ◽

Miomir Stanković

Keyword(s):

Intrusion Detection ◽

Intrusion Detection System ◽

Computer Network ◽

Reference Model ◽

Detection System ◽

Supervised Machine Learning ◽

Support Vector ◽

Normal Behaviour ◽

Detection Systems ◽

Binary Classifiers

The goal of anomaly-based intrusion detection is to build a system which monitors computer network behaviour and generates alerts if either a known attack or an anomaly is detected. Anomaly-based intrusion detection system detects intrusions based on a reference model which identifies normal behaviour of the computer network and flags an anomaly. Basic challenges in anomaly-based detection are difficulties to identify a ‘normal’ network behaviour and complexity of the dataset needed to train the intrusion detection system. Supervised machine learning can be used to train the binary classifiers in order to recognize the notion of normality. In this paper we present an algorithm for feature selection and instances normalization which reduces the Kyoto 2006+ dataset in order to increase accuracy and decrease time for training, testing and validating intrusion detection systems based on five models: k-Nearest Neighbour (k-NN), weighted k-NN (wk-NN), Support Vector Machine (SVM), Decision Tree, and Feedforward Neural Network (FNN).

Download Full-text

Optimizing Error Rate in Intrusion Detection System Using Artificial Neural Network Algorithm

International Journal of Emerging Research in Management and Technology ◽

10.23956/ijermt.v6i9.102 ◽

2018 ◽

Vol 6 (9) ◽

pp. 152

Author(s):

S. Vijaya Rani ◽

G. N. K. Suresh Babu

Keyword(s):

Neural Network ◽

Artificial Neural Network ◽

Intrusion Detection ◽

Error Rate ◽

Learning Process ◽

Nearest Neighbor ◽

Detection System ◽

Support Vector ◽

K Nearest Neighbor ◽

Artificial Neural

The illegal hackers penetrate the servers and networks of corporate and financial institutions to gain money and extract vital information. The hacking varies from one computing system to many system. They gain access by sending malicious packets in the network through virus, worms, Trojan horses etc. The hackers scan a network through various tools and collect information of network and host. Hence it is very much essential to detect the attacks as they enter into a network. The methods available for intrusion detection are Naive Bayes, Decision tree, Support Vector Machine, K-Nearest Neighbor, Artificial Neural Networks. A neural network consists of processing units in complex manner and able to store information and make it functional for use. It acts like human brain and takes knowledge from the environment through training and learning process. Many algorithms are available for learning process This work carry out research on analysis of malicious packets and predicting the error rate in detection of injured packets through artificial neural network algorithms.

Download Full-text

Supervised Classifier Approach for Intrusion Detection on KDD with Optimal MapReduce Framework Model in Cloud Computing

Recent Patents on Computer Science ◽

10.2174/1573401315666190619113510 ◽

2019 ◽

Vol 12 ◽

Author(s):

M. Ilayaraja ◽

S. Hemalatha ◽

P. Manickam ◽

K. Sathesh Kumar ◽

K. Shankar

Keyword(s):

Machine Learning ◽

Cloud Computing ◽

Intrusion Detection ◽

Decision Tree ◽

Learning Strategies ◽

Nearest Neighbor ◽

Detection System ◽

K Nearest Neighbor ◽

Mapreduce Model ◽

The Web

Cloud computing is characterized as the arrangement of assets or administrations accessible through the web to the clients on their request by cloud providers. It communicates everything as administrations over the web in view of the client request, for example operating system, organize equipment, storage, assets, and software. Nowadays, Intrusion Detection System (IDS) plays a powerful system, which deals with the influence of experts to get actions when the system is hacked under some intrusions. Most intrusion detection frameworks are created in light of machine learning strategies. Since the datasets, this utilized as a part of intrusion detection is Knowledge Discovery in Database (KDD). In this paper detect or classify the intruded data utilizing Machine Learning (ML) with the MapReduce model. The primary face considers Hadoop MapReduce model to reduce the extent of database ideal weight decided for reducer model and second stage utilizing Decision Tree (DT) classifier to detect the data. This DT classifier comprises utilizing an appropriate classifier to decide the class labels for the non-homogeneous leaf nodes. The decision tree fragment gives a coarse section profile while the leaf level classifier can give data about the qualities that influence the label inside a portion. From the proposed result accuracy for detection is 96.21% contrasted with existing classifiers, for example, Neural Network (NN), Naive Bayes (NB) and K Nearest Neighbor (KNN).

Download Full-text

A Feature Selection Approach for Network Intrusion Detection Based on Tree-Seed Algorithm and K-Nearest Neighbor

2018 IEEE 4th International Symposium on Wireless Systems within the International Conferences on Intelligent Data Acquisition and Advanced Computing Systems (IDAACS-SWS) ◽

10.1109/idaacs-sws.2018.8525522 ◽

2018 ◽

Cited By ~ 3

Author(s):

Feng Chen ◽

Zhiwei Ye ◽

Chunzhi Wang ◽

Lingyu Yan ◽

Ruoxi Wang

Keyword(s):

Feature Selection ◽

Intrusion Detection ◽

Nearest Neighbor ◽

Network Intrusion Detection ◽

K Nearest Neighbor ◽

Network Intrusion ◽

Selection Approach ◽

Feature Selection Approach ◽

Tree Seed

Download Full-text

Scrutinizing Attacks and Evaluating Performance Appraisal Parameters via Feature Selection in Intrusion Detection System

10.21203/rs.3.rs-748765/v1 ◽

2021 ◽

Author(s):

Navroop Kaur ◽

Meenakshi Bansal ◽

Sukhwinder Singh S

Keyword(s):

Feature Selection ◽

Performance Evaluation ◽

Intrusion Detection ◽

Intrusion Detection System ◽

Detection System ◽

Denial Of Service ◽

Cyber Attacks ◽

Support Vector ◽

K Nearest Neighbor ◽

Evaluation Parameters

Abstract In modern times the firewall and antivirus packages are not good enough to protect the organization from numerous cyber attacks. Computer IDS (Intrusion Detection System) is a crucial aspect that contributes to the success of an organization. IDS is a software application responsible for scanning organization networks for suspicious activities and policy rupturing. IDS ensures the secure and reliable functioning of the network within an organization. IDS underwent huge transformations since its origin to cope up with the advancing computer crimes. The primary motive of IDS has been to augment the competence of detecting the attacks without endangering the performance of the network. The research paper elaborates on different types and different functions performed by the IDS. The NSL KDD dataset has been considered for training and testing. The seven prominent classifiers LR (Logistic Regression), NB (Naïve Bayes), DT (Decision Tree), AB (AdaBoost), RF (Random Forest), kNN (k Nearest Neighbor), and SVM (Support Vector Machine) have been studied along with their pros and cons and the feature selection have been imposed to enhance the reading of performance evaluation parameters (Accuracy, Precision, Recall, and F1Score). The paper elaborates a detailed flowchart and algorithm depicting the procedure to perform feature selection using XGB (Extreme Gradient Booster) for four categories of attacks: DoS (Denial of Service), Probe, R2L (Remote to Local Attack), and U2R (User to Root Attack). The selected features have been ranked as per their occurrence. The implementation have been conducted at five different ratios of 60-40%, 70-30%, 90-10%, 50-50%, and 80-20%. Different classifiers scored best for different performance evaluation parameters at different ratios. NB scored with the best Accuracy and Recall values. DT and RF consistently performed with high accuracy. NB, SVM, and kNN achieved good F1Score.

Download Full-text

FEATURE SELECTION AND CLASSIFICATION OF INTRUSION DETECTION SYSTEM USING ROUGH SET

International Journal of Communication Networks and Security ◽

10.47893/ijcns.2013.1085 ◽

2013 ◽

pp. 48-51

Author(s):

NIKITA GUPTA ◽

NARENDER SINGH ◽

VIJAY SHARMA ◽

TARUN SHARMA ◽

AMAN SINGH BHANDARI

Keyword(s):

Feature Selection ◽

Intrusion Detection ◽

Rough Set ◽

Intrusion Detection System ◽

Rough Set Theory ◽

Computer Network ◽

Detection System ◽

Classification Rules ◽

Training Set

With the expansion of computer network there is a challenge to compete with the intruders who can easily break into the system. So it becomes a necessity to device systems or algorithms that can not only detect intrusion but can also improve the detection rate. In this paper we propose an intrusion detection system that uses rough set theory for feature selection, which is extraction of relevant attributes from the entire set of attributes describing a data packet and used the same theory to classify the packet if it is normal or an attack. After the simplification of the discernibility matrix we were to select or reduce the features. We have used Rosetta tool to obtain the reducts and classification rules. NSL KDD dataset is used as training set and is provided to Rosetta to obtain the classification rules.

Download Full-text