How to Effectively Collect and Process Network Data for Intrusion Detection?

The number of security breaches in the cyberspace is on the rise. This threat is met with intensive work in the intrusion detection research community. To keep the defensive mechanisms up to date and relevant, realistic network traffic datasets are needed. The use of flow-based data for machine-learning-based network intrusion detection is a promising direction for intrusion detection systems. However, many contemporary benchmark datasets do not contain features that are usable in the wild. The main contribution of this work is to cover the research gap related to identifying and investigating valuable features in the NetFlow schema that allow for effective, machine-learning-based network intrusion detection in the real world. To achieve this goal, several feature selection techniques have been applied on five flow-based network intrusion detection datasets, establishing an informative flow-based feature set. The authors’ experience with the deployment of this kind of system shows that to close the research-to-market gap, and to perform actual real-world application of machine-learning-based intrusion detection, a set of labeled data from the end-user has to be collected. This research aims at establishing the appropriate, minimal amount of data that is sufficient to effectively train machine learning algorithms in intrusion detection. The results show that a set of 10 features and a small amount of data is enough for the final model to perform very well.

Download Full-text

A predictive model for network intrusion detection using stacking approach

International Journal of Electrical and Computer Engineering (IJECE) ◽

10.11591/ijece.v10i3.pp2734-2741 ◽

2020 ◽

Vol 10 (3) ◽

pp. 2734

Author(s):

Smitha Rajagopal ◽

Poornima Panduranga Kundapur ◽

Hareesh Katiganere Siddaramappa

Keyword(s):

Machine Learning ◽

Intrusion Detection ◽

Cyber Attacks ◽

Machine Learning Algorithms ◽

Network Intrusion Detection ◽

Network Intrusion ◽

Technological Advances ◽

No Free Lunch Theorem ◽

Benchmark Datasets ◽

Robust Processing

Due to the emerging technological advances, cyber-attacks continue to hamper information systems. The changing dimensionality of cyber threat landscape compel security experts to devise novel approaches to address the problem of network intrusion detection. Machine learning algorithms are extensively used to detect intrusions by dint of their remarkable predictive power. This work presents an ensemble approach for network intrusion detection using a concept called Stacking. As per the popular no free lunch theorem of machine learning, employing single classifier for a problem at hand may not be ideal to achieve generalization. Therefore, the proposed work on network intrusion detection emphasizes upon a combinative approach to improve performance. A robust processing paradigm called Graphlab Create, capable of upholding massive data has been used to implement the proposed methodology. Two benchmark datasets like UNSW NB-15 and UGR’ 16 datasets are considered to demonstrate the validity of predictions. Empirical investigation has illustrated that the performance of the proposed approach has been reasonably good. The contribution of the proposed approach lies in its finesse to generate fewer misclassifications pertaining to various attack vectors considered in the study.

Download Full-text

Comparison of Machine Learning Algorithms to Build Optimized Network Intrusion Detection System

Journal of Computational and Theoretical Nanoscience ◽

10.1166/jctn.2019.7929 ◽

2019 ◽

Vol 16 (5) ◽

pp. 2541-2549 ◽

Cited By ~ 2

Author(s):

H Parveen Sultana ◽

Nirvishi Shrivastava ◽

Dhanapal Durai Dominic ◽

N Nalini ◽

J. M Balajee

Keyword(s):

Machine Learning ◽

Intrusion Detection ◽

Intrusion Detection System ◽

Detection System ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Network Intrusion Detection ◽

Network Intrusion ◽

Network Intrusion Detection System

Download Full-text

A Survey on Data-driven Network Intrusion Detection

ACM Computing Surveys ◽

10.1145/3472753 ◽

2022 ◽

Vol 54 (9) ◽

pp. 1-36

Author(s):

Dylan Chou ◽

Meng Jiang

Keyword(s):

Machine Learning ◽

Intrusion Detection ◽

Real World ◽

Data Driven ◽

Network Intrusion Detection ◽

Large Network ◽

Learning Models ◽

Simulated Environments ◽

Network Intrusion ◽

Machine Learning Models

Data-driven network intrusion detection (NID) has a tendency towards minority attack classes compared to normal traffic. Many datasets are collected in simulated environments rather than real-world networks. These challenges undermine the performance of intrusion detection machine learning models by fitting machine learning models to unrepresentative “sandbox” datasets. This survey presents a taxonomy with eight main challenges and explores common datasets from 1999 to 2020. Trends are analyzed on the challenges in the past decade and future directions are proposed on expanding NID into cloud-based environments, devising scalable models for large network data, and creating labeled datasets collected in real-world networks.

Download Full-text

Comparative Evaluation of Machine Learning Algorithms for Network Intrusion Detection Using Weka

Towards Extensible and Adaptable Methods in Computing ◽

10.1007/978-981-13-2348-5_15 ◽

2018 ◽

pp. 195-208 ◽

Cited By ~ 1

Author(s):

Nureni Ayofe Azeez ◽

Obinna Justin Asuzu ◽

Sanjay Misra ◽

Adewole Adewumi ◽

Ravin Ahuja ◽

...

Keyword(s):

Machine Learning ◽

Intrusion Detection ◽

Comparative Evaluation ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Network Intrusion Detection ◽

Network Intrusion

Download Full-text

A Comprehensive Analysis of Accuracies of Machine Learning Algorithms for Network Intrusion Detection

Machine Learning for Networking - Lecture Notes in Computer Science ◽

10.1007/978-3-030-45778-5_4 ◽

2020 ◽

pp. 40-57

Author(s):

Anurag Das ◽

Samuel A. Ajila ◽

Chung-Horng Lung

Keyword(s):

Machine Learning ◽

Intrusion Detection ◽

Learning Algorithms ◽

Comprehensive Analysis ◽

Machine Learning Algorithms ◽

Network Intrusion Detection ◽

Network Intrusion

Download Full-text

Adaptive Hybrid Model for Network Intrusion Detection and Comparison among Machine Learning Algorithms

International Journal of Machine Learning and Computing ◽

10.7763/ijmlc.2015.v5.476 ◽

2015 ◽

Vol 5 (1) ◽

pp. 17-23 ◽

Cited By ~ 12

Author(s):

Md. Enamul Haque ◽

Talal M. Alkharobi

Keyword(s):

Machine Learning ◽

Intrusion Detection ◽

Hybrid Model ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Network Intrusion Detection ◽

Network Intrusion

Download Full-text

Ensemble-Based Online Machine Learning Algorithms for Network Intrusion Detection Systems Using Streaming Data

Information ◽

10.3390/info11060315 ◽

2020 ◽

Vol 11 (6) ◽

pp. 315

Author(s):

Nathan Martindale ◽

Muhammad Ismail ◽

Douglas A. Talbert

Keyword(s):

Machine Learning ◽

Random Forest ◽

Intrusion Detection ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Intrusion Detection Systems ◽

Network Intrusion Detection ◽

Detection Systems ◽

Network Intrusion ◽

Network Intrusion Detection Systems

As new cyberattacks are launched against systems and networks on a daily basis, the ability for network intrusion detection systems to operate efficiently in the big data era has become critically important, particularly as more low-power Internet-of-Things (IoT) devices enter the market. This has motivated research in applying machine learning algorithms that can operate on streams of data, trained online or “live” on only a small amount of data kept in memory at a time, as opposed to the more classical approaches that are trained solely offline on all of the data at once. In this context, one important concept from machine learning for improving detection performance is the idea of “ensembles”, where a collection of machine learning algorithms are combined to compensate for their individual limitations and produce an overall superior algorithm. Unfortunately, existing research lacks proper performance comparison between homogeneous and heterogeneous online ensembles. Hence, this paper investigates several homogeneous and heterogeneous ensembles, proposes three novel online heterogeneous ensembles for intrusion detection, and compares their performance accuracy, run-time complexity, and response to concept drifts. Out of the proposed novel online ensembles, the heterogeneous ensemble consisting of an adaptive random forest of Hoeffding Trees combined with a Hoeffding Adaptive Tree performed the best, by dealing with concept drift in the most effective way. While this scheme is less accurate than a larger size adaptive random forest, it offered a marginally better run-time, which is beneficial for online training.

Download Full-text

Evaluation of Network Intrusion Detection with Features Selection and Machine Learning Algorithms on CICIDS-2017 Dataset

SSRN Electronic Journal ◽

10.2139/ssrn.3394103 ◽

2019 ◽

Author(s):

Shailesh Singh Panwar ◽

Y. P. Raiwani ◽

Lokesh Singh Panwar

Keyword(s):

Machine Learning ◽

Intrusion Detection ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Network Intrusion Detection ◽

Features Selection ◽

Network Intrusion

Download Full-text

Assessment of Machine Learning Algorithms for Network Intrusion Detection

International Journal of Engineering and Advanced Technology - Regular Issue ◽

10.35940/ijeat.d8689.049420 ◽

2020 ◽

Vol 9 (4) ◽

pp. 1667-1671

Keyword(s):

Machine Learning ◽

Intrusion Detection ◽

Performance Metrics ◽

Detection System ◽

Machine Learning Algorithms ◽

Network Intrusion Detection ◽

Learning Models ◽

Network Intrusion ◽

Tree Classifier ◽

Machine Learning Models

A Network Intrusion Detection System (NIDS) is a framework to identify network interruptions as well as abuse by checking network traffic movement and classifying it as either typical or strange. Numerous Intrusion Detection Systems have been implemented using simulated datasets like KDD’99 intrusion dataset but none of them uses a real time dataset. The proposed work performs and assesses tests to overview distinctive machine learning models reliant on KDD’99 intrusion dataset and an ongoing created dataset. The machine learning models achieved to compute required performance metrics so as to assess the chosen classifiers. The emphasis was on the accuracy metric so as to improve the recognition pace of the interruption identification framework. The actualized calculations showed that the decision tree classifier accomplished the most noteworthy estimation of accuracy while the logistic regression classifier has accomplished the least estimation of exactness for both of the datasets utilized.

Download Full-text