Network Intrusion Detection Systems Using Supervised Machine Learning Classification and Dimensionality Reduction Techniques: A Systematic Review

Author(s):  
Zein Ashi ◽  
Laila Aburashed ◽  
Mahmoud Qudah ◽  
Abdallah Qusef
Information ◽  
2020 ◽  
Vol 11 (6) ◽  
pp. 315
Author(s):  
Nathan Martindale ◽  
Muhammad Ismail ◽  
Douglas A. Talbert

As new cyberattacks are launched against systems and networks on a daily basis, the ability for network intrusion detection systems to operate efficiently in the big data era has become critically important, particularly as more low-power Internet-of-Things (IoT) devices enter the market. This has motivated research in applying machine learning algorithms that can operate on streams of data, trained online or “live” on only a small amount of data kept in memory at a time, as opposed to the more classical approaches that are trained solely offline on all of the data at once. In this context, one important concept from machine learning for improving detection performance is the idea of “ensembles”, where a collection of machine learning algorithms are combined to compensate for their individual limitations and produce an overall superior algorithm. Unfortunately, existing research lacks proper performance comparison between homogeneous and heterogeneous online ensembles. Hence, this paper investigates several homogeneous and heterogeneous ensembles, proposes three novel online heterogeneous ensembles for intrusion detection, and compares their performance accuracy, run-time complexity, and response to concept drifts. Out of the proposed novel online ensembles, the heterogeneous ensemble consisting of an adaptive random forest of Hoeffding Trees combined with a Hoeffding Adaptive Tree performed the best, by dealing with concept drift in the most effective way. While this scheme is less accurate than a larger size adaptive random forest, it offered a marginally better run-time, which is beneficial for online training.


2019 ◽  
Author(s):  
Abhishek Verma ◽  
Virender Ranga

In the era of digital revolution, a huge amount of data is being generated from different networks on a daily basis. Security of this data is of utmost importance. Intrusion Detection Systems are found to be one the best solutions towards detecting intrusions. Network Intrusion Detection Systems are employed as a defence system to secure networks. Various techniques for the effective development of these defence systems have been proposed in the literature. However, the research on the development of datasets used for training and testing purpose of such defence systems is equally concerned. Better datasets improve the online and offline intrusion detection capability of detection model. Benchmark datasets like KDD 99 and NSL-KDD cup 99 obsolete and do not contain network traces of modern attacks like Denial of Service, hence are unsuitable for the evaluation purpose. In this work, a detailed analysis of CIDDS-001 dataset has been done and presented. We have used different well-known machine learning techniques for analysing the complexity of the dataset. Eminent evaluation metrics including Detection Rate, Accuracy, False Positive Rate, Kappa statistics, Root mean squared error have been used to show the performance of employed machine learning techniques.


2019 ◽  
Author(s):  
Abhishek Verma ◽  
Virender Ranga

In the era of digital revolution, a huge amount of data is being generated from different networks on a daily basis. Security of this data is of utmost importance. Intrusion Detection Systems are found to be one the best solutions towards detecting intrusions. Network Intrusion Detection Systems are employed as a defence system to secure networks. Various techniques for the effective development of these defence systems have been proposed in the literature. However, the research on the development of datasets used for training and testing purpose of such defence systems is equally concerned. Better datasets improve the online and offline intrusion detection capability of detection model. Benchmark datasets like KDD 99 and NSL-KDD cup 99 obsolete and do not contain network traces of modern attacks like Denial of Service, hence are unsuitable for the evaluation purpose. In this work, a detailed analysis of CIDDS-001 dataset has been done and presented. We have used different well-known machine learning techniques for analysing the complexity of the dataset. Eminent evaluation metrics including Detection Rate, Accuracy, False Positive Rate, Kappa statistics, Root mean squared error have been used to show the performance of employed machine learning techniques.


Entropy ◽  
2021 ◽  
Vol 23 (11) ◽  
pp. 1532
Author(s):  
Mikołaj Komisarek ◽  
Marek Pawlicki ◽  
Rafał Kozik ◽  
Witold Hołubowicz ◽  
Michał Choraś

The number of security breaches in the cyberspace is on the rise. This threat is met with intensive work in the intrusion detection research community. To keep the defensive mechanisms up to date and relevant, realistic network traffic datasets are needed. The use of flow-based data for machine-learning-based network intrusion detection is a promising direction for intrusion detection systems. However, many contemporary benchmark datasets do not contain features that are usable in the wild. The main contribution of this work is to cover the research gap related to identifying and investigating valuable features in the NetFlow schema that allow for effective, machine-learning-based network intrusion detection in the real world. To achieve this goal, several feature selection techniques have been applied on five flow-based network intrusion detection datasets, establishing an informative flow-based feature set. The authors’ experience with the deployment of this kind of system shows that to close the research-to-market gap, and to perform actual real-world application of machine-learning-based intrusion detection, a set of labeled data from the end-user has to be collected. This research aims at establishing the appropriate, minimal amount of data that is sufficient to effectively train machine learning algorithms in intrusion detection. The results show that a set of 10 features and a small amount of data is enough for the final model to perform very well.


2008 ◽  
Vol 17 (03) ◽  
pp. 449-463
Author(s):  
HANS HOLLAND ◽  
MIROSLAV KUBAT ◽  
JAN ŽIŽKA

In an attempt to automate evaluation of network intrusion detection systems, we encountered the problem of ambiguously described learning examples. For instance, an attribute's value, or a class label, in a given example was known to be a or b but definitely not c or d. Previous research in machine learning usually either “disambiguated” the value (by giving preference to a or b), or replaced it with a “don't-know” symbol. Neither approach is satisfactory: while the former distorts the available information by pretending precise knowledge, the latter ignores the fact that at least something is known. Our experiments confirm the intuition that classification performance is indeed impaired if the ambiguities are not handled properly. In the research reported here, we limited ourselves to the realm of the relatively simple nearest-neighbor classifiers and investigated a few alternative solutions. The paper describes the techniques we used and describes their behavior in experimental domains.


2020 ◽  
Vol 10 (5) ◽  
pp. 1775 ◽  
Author(s):  
Roberto Magán-Carrión ◽  
Daniel Urda ◽  
Ignacio Díaz-Cano ◽  
Bernabé Dorronsoro

Presently, we are living in a hyper-connected world where millions of heterogeneous devices are continuously sharing information in different application contexts for wellness, improving communications, digital businesses, etc. However, the bigger the number of devices and connections are, the higher the risk of security threats in this scenario. To counteract against malicious behaviours and preserve essential security services, Network Intrusion Detection Systems (NIDSs) are the most widely used defence line in communications networks. Nevertheless, there is no standard methodology to evaluate and fairly compare NIDSs. Most of the proposals elude mentioning crucial steps regarding NIDSs validation that make their comparison hard or even impossible. This work firstly includes a comprehensive study of recent NIDSs based on machine learning approaches, concluding that almost all of them do not accomplish with what authors of this paper consider mandatory steps for a reliable comparison and evaluation of NIDSs. Secondly, a structured methodology is proposed and assessed on the UGR’16 dataset to test its suitability for addressing network attack detection problems. The guideline and steps recommended will definitively help the research community to fairly assess NIDSs, although the definitive framework is not a trivial task and, therefore, some extra effort should still be made to improve its understandability and usability further.


Sign in / Sign up

Export Citation Format

Share Document