scholarly journals Evaluation of K-Means Clustering for Effective Intrusion Detection and Prevention in Massive Network Traffic Data

2014 ◽  
Vol 96 (7) ◽  
pp. 9-14 ◽  
Author(s):  
Kamini Nalavade ◽  
B. B. Meshram
Author(s):  
Yu Wang

In this chapter we will focus on examining computer network traffic and data. A computer network combines a set of computers and physically and logically connects them together to exchange information. Network traffic acquired from a network system provides information on data communications within the network and between networks or individual computers. The most common data types are log data, such as Kerberos logs, transmission control protocol/Internet protocol (TCP/IP) logs, Central processing unit (CPU) usage data, event logs, user command data, Internet visit data, operating system audit trail data, intrusion detection and prevention service (IDS/IPS) logs, Netflow1 data, and the simple network management protocol (SNMP) reporting data. Such information is unique and valuable for network security, specifically for intrusion detection and prevention. Although we have already presented some essential challenges in collecting such data in Chapter I, we will discuss traffic data, as well as other related data, in greater detail in this chapter. Specifically, we will describe system-specific and user-specific data types in Sections System- Specific Data and User-Specific Data, respectively, and provide detailed information on publicly available data in Section Publicly Available Data.


2011 ◽  
Vol 11 (2) ◽  
pp. 2042-2056 ◽  
Author(s):  
Emilio Corchado ◽  
Álvaro Herrero

2021 ◽  
pp. 111-121
Author(s):  
Giuseppina Andresini ◽  
Annalisa Appice ◽  
Corrado Loglisci ◽  
Vincenzo Belvedere ◽  
Domenico Redavid ◽  
...  

2021 ◽  
pp. 1-18
Author(s):  
Satish Kumar ◽  
Sunanda Gupta ◽  
Sakshi Arora

Network Intrusion detection systems (NIDS) detect malicious and intrusive information in computer networks. Presently, commercial NIDS is based on machine learning approaches that have complex algorithms and increase intrusion detection efficiency and efficacy. These machine learning-based NIDS use high dimensional network traffic data from which intrusive information is to be detected. This high-dimensional network traffic data in NIDS needs to be preprocessed and normalized to make it suitable for machine learning tools. A machine learning approach with appropriate normalization and prepossessing increases NIDS performance. This paper presents an empirical study on various normalization methods implemented on a benchmark network traffic dataset, KDD Cup’99, that has been used to evaluate the NIDS model. The present study shows decimal normalization has a better prediction performance than non-normalized traffic data categorized into ‘normal’ or ‘intrusive’ classes.


Author(s):  
Yu Wang

Increasing the accuracy of classification has been a constant challenge in the network security area. While expansively increasing in the volume of network traffic and advantage in network bandwidth, many classification algorithms used for intrusion detection and prevention face high false positive and false negative rates. A stream of network traffic data with many positive predictors might not necessary represent a true attack, and a seemingly anomaly-free stream could represent a novel attack. Depending on the infrastructure of a network system, traffic data can become very large. As a result of such large volumes of data, a very low misclassification rate can yield a large number of alarms; for example, a system with 22 million hourly traffics with a 1% misclassification rate could have approximately 75 alarms within a second (excluding repeated connections). Validating every such case for review is not practical. To address this challenge we can improve the data collection process and develop more robust algorithms. Unlike other research areas, such as the life sciences, healthcare, or economics, where an analysis can be achieved based on a single statistical approach, a robust intrusion detection scheme need to be constructed hierarchically with multiple algorithms. For example, profiling and classifying user behavior hierarchically, using hybrid algorithms (e.g., combining statistics and AI). On the other hand, we can improve the precision of classification by carefully evaluating the results. There are several key elements that are important for statistical evaluation in classification and prediction, such as reliability, sensitivity, specificity, misclassification, and goodness-of-fit. We also need to evaluate the goodness of the data (consistency and repeatability), goodness of the classification, and goodness of the model. We will discuss these topics in this chapter.


Sign in / Sign up

Export Citation Format

Share Document