Problems of KDD Cup 99 Dataset Existed and Data Preprocessing

2014 ◽  
Vol 667 ◽  
pp. 218-225 ◽  
Author(s):  
Yan Wang ◽  
Kun Yang ◽  
Xiang Jing ◽  
Huang Long Jin

KDD Cup 99 dataset is not only the most widely used dataset in intrusion detection, but also the de facto benchmark on evaluating the performance merits of intrusion detection system. Nevertheless there are a lot of issues in this dataset which cannot be omitted. In order to establish good data mining models in intrusion detection and find the appropriate network intrusion attack types’ features, researchers should have a well-known understanding on this dataset. In this paper, first and foremost we have made an in-depth analysis on the problems which the dataset are existed, and given the related solutions. Secondly, we also have carried out plenty data preprocessing on the 10% subset of KDD Cup 99 dataset’s training set, giving better results to the following process. What’s more, by comparing 10 common kinds of data mining algorithms in our experiment, we have analyzed and summarized that data preprocessing plays a vital role on the performance and importance to data mining algorithms.

2021 ◽  
Author(s):  
Neeraj Kumar ◽  
Upendra Kumar

Abstract Information and Communication Technologies, to a long extent, have a major influence on our social life, economy as well as on worldwide security. Holistically, computer networks embrace the Information Technology. Although the world is never free from people having malicious intents i.e. cyber criminals, network intruders etc. To counter this, Intrusion Detection System (IDS) plays a very significant role in identifying the network intrusions by performing various data analysis tasks. In order to develop robust IDS with accuracy in intrusion detection, various papers have been published over the years using different classification techniques of Data Mining (DM) and Machine Learning (ML) based hybrid approach. The present paper is an in-depth analysis of two focal aspects of Network Intrusion Detection System that includes various pre-processing methods in the form of dimensionality reduction and an assortment of classification techniques. This paper also includes comparative algorithmic analysis of DM and ML techniques, which applied to design an intelligent IDS. An experiment al comparative analysis has been carried out in support the verdicts of this work using ‘Python’ language on ‘kddcup99’ dataset as benchmark . Experimental analysis had been done in which we had found more impact on dimensionality reduction and MLP performed well in the true classification to establish secure network. The motive behind this effort is to detect different kinds of malware as early as possible with accuracy, to provide enhanced observant among various existing techniques that may help the fascinated researchers for future potential works.


2022 ◽  
Vol 2161 (1) ◽  
pp. 012043
Author(s):  
Ananya Devarakonda ◽  
Nilesh Sharma ◽  
Prita Saha ◽  
S Ramya

Abstract As most of the population acquires access to the internet, protecting online identity from threats of confidentiality, integrity, and accessibility becomes an increasingly important problem to tackle. By definition, a network intrusion detection system (IDS) helps pinpoint and identify anomalous network traffic to bring forward and classify suspicious activity. It is a fundamental part of network security and provides the first line of defense against a potential attack by alerting an administrator or appropriate personnel of possible malicious network activity. Several academic publications propose various artificial intelligence (AI) methods for an accurate network intrusion detection system (IDS). This paper outlines and compares four AI methods to train two benchmark datasets- the KDD’99 and the NSL-KDD. Apart from model selection, data preprocessing plays a vital role in contributing to accurate solutions, and thus, we propose a simple yet effective data preprocessing method. We also evaluate and compare the accuracy and performance of four popular models- decision tree (DT), multi-layer perceptron (MLP), random forest (RF), and a stacked autoencoder (SAE) model. Of the four methods, the random forest classifier showed the most consistent and accurate results.


2010 ◽  
Vol 20-23 ◽  
pp. 867-871
Author(s):  
Li Li ◽  
Ye Yuan

Most of IDS(Intrusion Detection System) are very particular about data source which might be asked to be categorical data or need to be correctly labeled. Therefore, the data preprocessing is an indispensable part in intrusion detecting. KDD Cpu 1999 Dataset is usually used for experimental data. This paper briefly introduces the features and the structure of the KDD Cpu 1999 Dataset and presents the method of the data preprocessing at Intrusion Detection System based on the neural network clustering’s algorithm.


Sign in / Sign up

Export Citation Format

Share Document