scholarly journals Comparative Study of Datasets used in Cyber Security Intrusion Detection

Author(s):  
Rahul Yadav ◽  
Phalguni Pathak ◽  
Saumya Saraswat

In recent years, deep learning frameworks are applied in various domains and achieved shows potential performance that includes malware detection software, self-driving cars, identity recognition cameras, adversarial attacks became one crucial security threat to several deep learning applications in today’s world Deep learning techniques became the core part for several cyber security applications like intrusion detection, android malware detection, spam, malware classification, binary analysis and phishing detection. . One of the major research challenges in this field is the insufficiency of a comprehensive data set which reflects contemporary network traffic scenarios, broad range of low footprint intrusions and in depth structured information about the network traffic. For Evaluation of network intrusion detection systems, many benchmark data sets were developed a decade ago. In this paper, we provides a focused literature survey of data sets used for network based intrusion detection and characterize the underlying packet and flow-based network data in detail used for intrusion detection in cyber security. The datasets plays incredibly vital role in intrusion detection; as a result we illustrate cyber datasets and provide a categorization of those datasets.

2021 ◽  
Author(s):  
Ming Li ◽  
Dezhi Han ◽  
Dun Li ◽  
Han Liu ◽  
Chin- Chen Chang

Abstract Network intrusion detection, which takes the extraction and analysis of network traffic features as the main method, plays a vital role in network security protection. The current network traffic feature extraction and analysis for network intrusion detection mostly uses deep learning algorithms. Currently, deep learning requires a lot of training resources, and have weak processing capabilities for imbalanced data sets. In this paper, a deep learning model (MFVT) based on feature fusion network and Vision Transformer architecture is proposed, to which improves the processing ability of imbalanced data sets and reduces the sample data resources needed for training. Besides, to improve the traditional raw traffic features extraction methods, a new raw traffic features extraction method (CRP) is proposed, the CPR uses PCA algorithm to reduce all the processed digital traffic features to the specified dimension. On the IDS 2017 dataset and the IDS 2012 dataset, the ablation experiments show that the performance of the proposed MFVT model is significantly better than other network intrusion detection models, and the detection accuracy can reach the state-of-the-art level. And, When MFVT model is combined with CRP algorithm, the detection accuracy is further improved to 99.99%.


Symmetry ◽  
2021 ◽  
Vol 13 (8) ◽  
pp. 1453
Author(s):  
Renjian Lyu ◽  
Mingshu He ◽  
Yu Zhang ◽  
Lei Jin ◽  
Xinlei Wang

Deep learning has been applied in the field of network intrusion detection and has yielded good results. In malicious network traffic classification tasks, many studies have achieved good performance with respect to the accuracy and recall rate of classification through self-designed models. In deep learning, the design of the model architecture greatly influences the results. However, the design of the network model architecture usually requires substantial professional knowledge. At present, the focus of research in the field of traffic monitoring is often directed elsewhere. Therefore, in the classification task of the network intrusion detection field, there is much room for improvement in the design and optimization of the model architecture. A neural architecture search (NAS) can automatically search the architecture of the model under the premise of a given optimization goal. For this reason, we propose a model that can perform NAS in the field of network traffic classification and search for the optimal architecture suitable for traffic detection based on the network traffic dataset. Each layer of our depth model is constructed according to the principle of maximum coding rate attenuation, which has strong consistency and symmetry in structure. Compared with some manually designed network architectures, classification indicators, such as Top-1 accuracy and F1 score, are also greatly improved while ensuring the lightweight nature of the model. In addition, we introduce a surrogate model in the search task. Compared to using the traditional NAS model to search the network traffic classification model, our NAS model greatly improves the search efficiency under the premise of ensuring that the results are not substantially different. We also manually adjust some operations in the search space of the architecture search to find a set of model operations that are more suitable for traffic classification. Finally, we apply the searched model to other traffic datasets to verify the universality of the model. Compared with several common network models in the traffic field, the searched model (NAS-Net) performs better, and the classification effect is more accurate.


Sensors ◽  
2020 ◽  
Vol 20 (14) ◽  
pp. 3817 ◽  
Author(s):  
Zhidong Wang ◽  
Yingxu Lai ◽  
Zenghui Liu ◽  
Jing Liu

Intrusion detection is only the initial part of the security system for an industrial control system. Because of the criticality of the industrial control system, professionals still make the most important security decisions. Therefore, a simple intrusion alarm has a very limited role in the security system, and intrusion detection models based on deep learning struggle to provide more information because of the lack of explanation. This limits the application of deep learning methods to industrial control network intrusion detection. We analyzed the deep neural network (DNN) model and the interpretable classification model from the perspective of information, and clarified the correlation between the calculation process of the DNN model and the classification process. By comparing the normal samples with the abnormal samples, the abnormalities that occur during the calculation of the DNN model compared to the normal samples could be found. Based on this, a layer-wise relevance propagation method was designed to map the abnormalities in the calculation process to the abnormalities of attributes. At the same time, considering that the data set may already contain some useful information, we designed filtering rules for a kind of data set that can be obtained at a low cost, so that the calculation result is presented in a more accurate manner, which should help professionals lock and address intrusion threats more quickly.


IEEE Access ◽  
2020 ◽  
Vol 8 ◽  
pp. 125786-125796
Author(s):  
Jiayin Feng ◽  
Limin Shen ◽  
Zhen Chen ◽  
Yuying Wang ◽  
Hui Li

2021 ◽  
Vol 11 (23) ◽  
pp. 11283
Author(s):  
Hsiao-Chung Lin ◽  
Ping Wang ◽  
Kuo-Ming Chao ◽  
Wen-Hui Lin ◽  
Zong-Yu Yang

Most approaches for detecting network attacks involve threat analyses to match the attack to potential malicious profiles using behavioral analysis techniques in conjunction with packet collection, filtering, and feature comparison. Experts in information security are often required to study these threats, and judging new types of threats accurately in real time is often impossible. Detecting legitimate or malicious connections using protocol analysis is difficult; therefore, machine learning-based function modules can be added to intrusion detection systems to assist experts in accurately judging threat categories by analyzing the threat and learning its characteristics. In this paper, an ensemble learning scheme based on a revised random forest algorithm is proposed for a security monitoring system in the domain of renewable energy to categorize network threats in a network intrusion detection system. To reduce classification error for minority classes of experimental data in model training, the synthetic minority oversampling technique scheme (SMOTE) was formulated to re-balance the original data sets by altering the number of data points for minority class to imbue the experimental data set. The classification performance of the proposed classifier in threat classification when the data set is unbalanced was experimentally verified in terms of accuracy, precision, recall, and F1-score on the UNSW-NB15 and CSE-CIC-IDS 2018 data sets. A cross-validation scheme featuring support vector machines was used to compare classification accuracies.


2021 ◽  
Author(s):  
Yan Jian ◽  
Xiaoyang Dong ◽  
Liang Jian

Abstract Based on deep learning, this study combined sparse autoencoder (SAE) with extreme learning machine (ELM) to design an SAE-ELM method to reduce the dimension of data features and realize the classification of different types of data. Experiments were carried out on NSL-KDD and UNSW-NB2015 data sets. The results showed that, compared with the K-means algorithm and the SVM algorithm, the proposed method had higher performance. On the NSL-KDD data set, the average accuracy rate of the SAE-ELM method was 98.93%, the false alarm rate was 0.17%, and the missing report rate was 5.36%. On the UNSW-NB2015 data set, the accuracy rate of the SAE-ELM method was 98.88%, the false alarm rate was 0.12%, and the missing report rate was 4.31%. The results show that the SAE-ELM method is effective in the detection and recognition of abnormal data and can be popularized and applied.


2020 ◽  
Vol 6 ◽  
pp. e327
Author(s):  
Thavavel Vaiyapuri ◽  
Adel Binbusayyis

The ever-increasing use of internet has opened a new avenue for cybercriminals, alarming the online businesses and organization to stay ahead of evolving thread landscape. To this end, intrusion detection system (IDS) is deemed as a promising defensive mechanism to ensure network security. Recently, deep learning has gained ground in the field of intrusion detection but majority of progress has been witnessed on supervised learning which requires adequate labeled data for training. In real practice, labeling the high volume of network traffic is laborious and error prone. Intuitively, unsupervised deep learning approaches has received gaining momentum. Specifically, the advances in deep learning has endowed autoencoder (AE) with greater ability for data reconstruction to learn the robust feature representation from massive amount of data. Notwithstanding, there is no study that evaluates the potential of different AE variants as one-class classifier for intrusion detection. This study fills this gap of knowledge presenting a comparative evaluation of different AE variants for one-class unsupervised intrusion detection. For this research, the evaluation includes five different variants of AE such as Stacked AE, Sparse AE, Denoising AE, Contractive AE and Convolutional AE. Further, the study intents to conduct a fair comparison establishing a unified network configuration and training scheme for all variants over the common benchmark datasets, NSL-KDD and UNSW-NB15. The comparative evaluation study provides a valuable insight on how different AE variants can be used as one-class classifier to build an effective unsupervised IDS. The outcome of this study will be of great interest to the network security community as it provides a promising path for building effective IDS based on deep learning approaches alleviating the need for adequate and diverse intrusion network traffic behavior.


2019 ◽  
Vol 11 (3) ◽  
pp. 65-89 ◽  
Author(s):  
Vinayakumar R ◽  
Soman KP ◽  
Prabaharan Poornachandran

Recently, due to the advance and impressive results of deep learning techniques in the fields of image recognition, natural language processing and speech recognition for various long-standing artificial intelligence (AI) tasks, there has been a great interest in applying towards security tasks too. This article focuses on applying these deep taxonomy techniques to network intrusion detection system (N-IDS) with the aim to enhance the performance in classifying the network connections as either good or bad. To substantiate this to NIDS, this article models network traffic as a time series data, specifically transmission control protocol / internet protocol (TCP/IP) packets in a predefined time-window with a supervised deep learning methods such as recurrent neural network (RNN), identity matrix of initialized values typically termed as identity recurrent neural network (IRNN), long short-term memory (LSTM), clock-work RNN (CWRNN) and gated recurrent unit (GRU), utilizing connection records of KDDCup-99 challenge data set. The main interest is given to evaluate the performance of RNN over newly introduced method such as LSTM and IRNN to alleviate the vanishing and exploding gradient problem in memorizing the long-term dependencies. The efficient network architecture for all deep models is chosen based on comparing the performance of various network topologies and network parameters. The experiments of such chosen efficient configurations of deep models were run up to 1,000 epochs by varying learning-rates between 0.01-05. The observed results of IRNN are relatively close to the performance of LSTM on KDDCup-99 NIDS data set. In addition to KDDCup-99, the effectiveness of deep model architectures are evaluated on refined version of KDDCup-99: NSL-KDD and most recent one, UNSW-NB15 NIDS datasets.


Sign in / Sign up

Export Citation Format

Share Document