Intrusion Detection Using Random Forests Classifier with SMOTE and Feature Reduction

Author(s):  
Abebe Tesfahun ◽  
D. Lalitha Bhaskari
2014 ◽  
Vol 52 ◽  
Author(s):  
Ralf C. Staudemeyer ◽  
Christian W. Omlin

This work presents a data preprocessing and feature selection framework to support data mining and network security experts in minimal feature set selection of intrusion detection data. This process is supported by detailed visualisation and examination of class distributions. Distribution histograms, scatter plots and information gain are presented as supportive feature reduction tools. The feature reduction process applied is based on decision tree pruning and backward elimination. This paper starts with an analysis of the KDD Cup '99 datasets and their potential for feature reduction. The dataset consists of connection records with 41 features whose relevance for intrusion detection are not clear. All traffic is either classified `normal' or into the four attack types denial-of-service, network probe, remote-to-local or user-to-root. Using our custom feature selection process, we show how we can significantly reduce the number features in the dataset to a few salient features. We conclude by presenting minimal sets with 4--8 salient features for two-class and multi-class categorisation for detecting intrusions, as well as for the detection of individual attack classes; the performance using a static classifier compares favourably to the performance using all features available. The suggested process is of general nature and can be applied to any similar dataset.


2021 ◽  
Vol 14 (1) ◽  
pp. 192-202
Author(s):  
Karrar Alwan ◽  
◽  
Ahmed AbuEl-Atta ◽  
Hala Zayed ◽  
◽  
...  

Accurate intrusion detection is necessary to preserve network security. However, developing efficient intrusion detection system is a complex problem due to the nonlinear nature of the intrusion attempts, the unpredictable behaviour of network traffic, and the large number features in the problem space. Hence, selecting the most effective and discriminating feature is highly important. Additionally, eliminating irrelevant features can improve the detection accuracy as well as reduce the learning time of machine learning algorithms. However, feature reduction is an NPhard problem. Therefore, several metaheuristics have been employed to determine the most effective feature subset within reasonable time. In this paper, two intrusion detection models are built based on a modified version of the firefly algorithm to achieve the feature selection task. The first and, the second models have been used for binary and multiclass classification, respectively. The modified firefly algorithm employed a mutation operation to avoid trapping into local optima through enhancing the exploration capabilities of the original firefly. The significance of the selected features is evaluated using a Naïve Bayes classifier over a benchmark standard dataset, which contains different types of attacks. The obtained results revealed the superiority of the modified firefly algorithm against the original firefly algorithm in terms of the classification accuracy and the number of selected features under different scenarios. Additionally, the results assured the superiority of the proposed intrusion detection system against other recently proposed systems in both binary classification and multi-classification scenarios. The proposed system has 96.51% and 96.942% detection accuracy in binary classification and multi-classification, respectively. Moreover, the proposed system reduced the number of attributes from 41 to 9 for binary classification and to 10 for multi-classification.


Author(s):  
Sadhana Patidar ◽  
Priyanka Parihar ◽  
Chetan Agrawal

Now-a-days with growing applications over internet increases the security issues over network. Many security applications are designed to cope with such security concerns but still it required more attention to improve speed as well accuracy. With advancement of technologies there is also evolution of new threats or attacks in network. So, it is required to design such detection system that can handle new threats in network. One of the network security tools is intrusion detection system which is used to detect malicious data packets. Machine learning tool is also used to improve efficiency of network-based intrusion detection system. In this paper, an intrusion detection system is proposed with an application of machine learning tools. The proposed model integrates feature reduction, affinity clustering and multilevel Ensemble Support Vector Machine. The proposed model performance is analyzed over two datasets i.e. NSL-KDD and UNSW-NB 15 dataset and achieved approx. 12% of efficiency over other existing work.


2020 ◽  
Vol 7 (1) ◽  
Author(s):  
Sydney M. Kasongo ◽  
Yanxia Sun

AbstractComputer networks intrusion detection systems (IDSs) and intrusion prevention systems (IPSs) are critical aspects that contribute to the success of an organization. Over the past years, IDSs and IPSs using different approaches have been developed and implemented to ensure that computer networks within enterprises are secure, reliable and available. In this paper, we focus on IDSs that are built using machine learning (ML) techniques. IDSs based on ML methods are effective and accurate in detecting networks attacks. However, the performance of these systems decreases for high dimensional data spaces. Therefore, it is crucial to implement an appropriate feature extraction method that can prune some of the features that do not possess a great impact in the classification process. Moreover, many of the ML based IDSs suffer from an increase in false positive rate and a low detection accuracy when the models are trained on highly imbalanced datasets. In this paper, we present an analysis the UNSW-NB15 intrusion detection dataset that will be used for training and testing our models. Moreover, we apply a filter-based feature reduction technique using the XGBoost algorithm. We then implement the following ML approaches using the reduced feature space: Support Vector Machine (SVM), k-Nearest-Neighbour (kNN), Logistic Regression (LR), Artificial Neural Network (ANN) and Decision Tree (DT). In our experiments, we considered both the binary and multiclass classification configurations. The results demonstrated that the XGBoost-based feature selection method allows for methods such as the DT to increase its test accuracy from 88.13 to 90.85% for the binary classification scheme.


Sign in / Sign up

Export Citation Format

Share Document