Dynamic Network Traffic Data Classification for Intrusion Detection Using Genetic Algorithm

Network Intrusion detection systems (NIDS) detect malicious and intrusive information in computer networks. Presently, commercial NIDS is based on machine learning approaches that have complex algorithms and increase intrusion detection efficiency and efficacy. These machine learning-based NIDS use high dimensional network traffic data from which intrusive information is to be detected. This high-dimensional network traffic data in NIDS needs to be preprocessed and normalized to make it suitable for machine learning tools. A machine learning approach with appropriate normalization and prepossessing increases NIDS performance. This paper presents an empirical study on various normalization methods implemented on a benchmark network traffic dataset, KDD Cup’99, that has been used to evaluate the NIDS model. The present study shows decimal normalization has a better prediction performance than non-normalized traffic data categorized into ‘normal’ or ‘intrusive’ classes.

Download Full-text

Evaluation

Statistical Techniques for Network Security ◽

10.4018/978-1-59904-708-9.ch012 ◽

2011 ◽

pp. 427-457

Author(s):

Yu Wang

Keyword(s):

Intrusion Detection ◽

Network Traffic ◽

Goodness Of Fit ◽

False Negative ◽

Data Consistency ◽

Misclassification Rate ◽

Traffic Data ◽

Research Areas ◽

Need To Evaluate ◽

Or Economics

Increasing the accuracy of classification has been a constant challenge in the network security area. While expansively increasing in the volume of network traffic and advantage in network bandwidth, many classification algorithms used for intrusion detection and prevention face high false positive and false negative rates. A stream of network traffic data with many positive predictors might not necessary represent a true attack, and a seemingly anomaly-free stream could represent a novel attack. Depending on the infrastructure of a network system, traffic data can become very large. As a result of such large volumes of data, a very low misclassification rate can yield a large number of alarms; for example, a system with 22 million hourly traffics with a 1% misclassification rate could have approximately 75 alarms within a second (excluding repeated connections). Validating every such case for review is not practical. To address this challenge we can improve the data collection process and develop more robust algorithms. Unlike other research areas, such as the life sciences, healthcare, or economics, where an analysis can be achieved based on a single statistical approach, a robust intrusion detection scheme need to be constructed hierarchically with multiple algorithms. For example, profiling and classifying user behavior hierarchically, using hybrid algorithms (e.g., combining statistics and AI). On the other hand, we can improve the precision of classification by carefully evaluating the results. There are several key elements that are important for statistical evaluation in classification and prediction, such as reliability, sensitivity, specificity, misclassification, and goodness-of-fit. We also need to evaluate the goodness of the data (consistency and repeatability), goodness of the classification, and goodness of the model. We will discuss these topics in this chapter.

Download Full-text

Intrusion Detection System Modeling Based on Learning from Network Traffic Data

KSII Transactions on Internet and Information Systems ◽

10.3837/tiis.2018.11.022 ◽

2018 ◽

Vol 12 (11) ◽

Keyword(s):

Intrusion Detection ◽

Network Traffic ◽

Intrusion Detection System ◽

Detection System ◽

System Modeling ◽

Traffic Data

Download Full-text

Extending Isolation Forest for Anomaly Detection in Big Data via K-Means

ACM Transactions on Cyber-Physical Systems ◽

10.1145/3460976 ◽

2021 ◽

Vol 5 (4) ◽

pp. 1-26

Author(s):

Md Tahmid Rahman Laskar ◽

Jimmy Xiangji Huang ◽

Vladan Smetana ◽

Chris Stewart ◽

Kees Pouw ◽

...

Keyword(s):

Big Data ◽

Intrusion Detection ◽

Anomaly Detection ◽

Computer Networks ◽

Network Traffic ◽

Intrusion Detection Systems ◽

Traffic Data ◽

Detection Systems ◽

Proposed Model ◽

Isolation Forest

Industrial Information Technology infrastructures are often vulnerable to cyberattacks. To ensure security to the computer systems in an industrial environment, it is required to build effective intrusion detection systems to monitor the cyber-physical systems (e.g., computer networks) in the industry for malicious activities. This article aims to build such intrusion detection systems to protect the computer networks from cyberattacks. More specifically, we propose a novel unsupervised machine learning approach that combines the K-Means algorithm with the Isolation Forest for anomaly detection in industrial big data scenarios. Since our objective is to build the intrusion detection system for the big data scenario in the industrial domain, we utilize the Apache Spark framework to implement our proposed model that was trained in large network traffic data (about 123 million instances of network traffic) stored in Elasticsearch. Moreover, we evaluate our proposed model on the live streaming data and find that our proposed system can be used for real-time anomaly detection in the industrial setup. In addition, we address different challenges that we face while training our model on large datasets and explicitly describe how these issues were resolved. Based on our empirical evaluation in different use cases for anomaly detection in real-world network traffic data, we observe that our proposed system is effective to detect anomalies in big data scenarios. Finally, we evaluate our proposed model on several academic datasets to compare with other models and find that it provides comparable performance with other state-of-the-art approaches.

Download Full-text

Dynamic Network Traffic Data Classification for Intrusion Detection Using Genetic Algorithm

MOVICAB-IDS: Visual Analysis of Network Traffic Data Streams for Intrusion Detection

Online Learning for Network Traffic Data Classification

Processing and Analytics of Big Network Traffic Data for Intrusion Detection

Neural visualization of network traffic data for intrusion detection

Evaluation of K-Means Clustering for Effective Intrusion Detection and Prevention in Massive Network Traffic Data

A Network Intrusion Detection System for Concept Drifting Network Traffic Data

A comparative simulation of normalization methods for machine learning-based intrusion detection systems using KDD Cup’99 dataset

Evaluation

Intrusion Detection System Modeling Based on Learning from Network Traffic Data

Extending Isolation Forest for Anomaly Detection in Big Data via K-Means

Export Citation Format