scholarly journals A K-means algorithm based on characteristics of density applied to network intrusion detection

2020 ◽  
Vol 17 (2) ◽  
pp. 665-687
Author(s):  
Jing Xu ◽  
Dezhi Han ◽  
Kuan-Ching Li ◽  
Hai Jiang

K-means algorithms are a group of popular unsupervised algorithms widely used for cluster analysis. However, the results of traditional K-means clustering algorithms are greatly affected by the initial clustering center, with unstable accuracy and low speed, which makes the algorithm hard to meet the requirements for Big Data. In this paper, a modernized version of the K-means algorithm based on density to select the initial seed of clustering is proposed. Firstly, Kd-tree is used to divide the hyper-rectangle space, so those points close to each other are grouped into the same sub-tree during data pre-processing, and the generalized information is stored in the tree structure. Besides, an improved Kd-tree nearest neighbor search is used in the K-means algorithm to prune the search space and optimize the operation for speedup. The clustering results show that the clusters are stable and accurate when the numbers of clusters and iterations are constant. Experimental results in the network intrusion detection case show that the improved version of the K-means algorithms performs better in terms of detection rate and false rate.

Author(s):  
SHI ZHONG ◽  
TAGHI M. KHOSHGOFTAAR ◽  
NAEEM SELIYA

Recently data mining methods have gained importance in addressing network security issues, including network intrusion detection — a challenging task in network security. Intrusion detection systems aim to identify attacks with a high detection rate and a low false alarm rate. Classification-based data mining models for intrusion detection are often ineffective in dealing with dynamic changes in intrusion patterns and characteristics. Consequently, unsupervised learning methods have been given a closer look for network intrusion detection. We investigate multiple centroid-based unsupervised clustering algorithms for intrusion detection, and propose a simple yet effective self-labeling heuristic for detecting attack and normal clusters of network traffic audit data. The clustering algorithms investigated include, k-means, Mixture-Of-Spherical Gaussians, Self-Organizing Map, and Neural-Gas. The network traffic datasets provided by the DARPA 1998 offline intrusion detection project are used in our empirical investigation, which demonstrates the feasibility and promise of unsupervised learning methods for network intrusion detection. In addition, a comparative analysis shows the advantage of clustering-based methods over supervised classification techniques in identifying new or unseen attack types.


Electronics ◽  
2020 ◽  
Vol 9 (8) ◽  
pp. 1206
Author(s):  
Hui Xu ◽  
Krzysztof Przystupa ◽  
Ce Fang ◽  
Andrzej Marciniak ◽  
Orest Kochan ◽  
...  

With the widespread use of the Internet, network security issues have attracted more and more attention, and network intrusion detection has become one of the main security technologies. As for network intrusion detection, the original data source always has a high dimension and a large amount of data, which greatly influence the efficiency and the accuracy. Thus, both feature selection and the classifier then play a significant role in raising the performance of network intrusion detection. This paper takes the results of classification optimization of weighted K-nearest neighbor (KNN) with those of the feature selection algorithm into consideration, and proposes a combination strategy of feature selection based on an integrated optimization algorithm and weighted KNN, in order to improve the performance of network intrusion detection. Experimental results show that the weighted KNN can increase the efficiency at the expense of a small amount of the accuracy. Thus, the proposed combination strategy of feature selection based on an integrated optimization algorithm and weighted KNN can then improve both the efficiency and the accuracy of network intrusion detection.


Author(s):  
Peyman Kabiri ◽  
Ali Ghorbani

With recent advances in network based technology and the increased dependency of our every day life on this technology, assuring reliable operation of network based systems is very important. During recent years, a number of attacks on networks have dramatically increased and consequently interest in network intrusion detection has increased among the researchers. During the past few years, different approaches for collecting a dataset of network features, each with its own assumptions, have been proposed to detect network intrusions. Recently, many research works have been focused on better understanding of the network feature space so that they can come up with a better detection method. The curse of dimensionality is still a big obstacle in front of the researchers in network intrusion detection. In this chapter, DARPA’99 dataset is used for the study. Features in that dataset are analyzed with respect to their information value. Using the information value of the features, the number of dimensions in the data is reduced. Later on, using several clustering algorithms, effects of the dimension reduction on the dataset are studied and the results are reported.


2015 ◽  
Vol 2015 ◽  
pp. 1-21 ◽  
Author(s):  
Singh Vijendra ◽  
Sahoo Laxman

We present a multiobjective genetic clustering approach, in which data points are assigned to clusters based on new line symmetry distance. The proposed algorithm is called multiobjective line symmetry based genetic clustering (MOLGC). Two objective functions, first the Davies-Bouldin (DB) index and second the line symmetry distance based objective functions, are used. The proposed algorithm evolves near-optimal clustering solutions using multiple clustering criteria, without a priori knowledge of the actual number of clusters. The multiple randomizedKdimensional (Kd) trees based nearest neighbor search is used to reduce the complexity of finding the closest symmetric points. Experimental results based on several artificial and real data sets show that proposed clustering algorithm can obtain optimal clustering solutions in terms of different cluster quality measures in comparison to existing SBKM and MOCK clustering algorithms.


2018 ◽  
Vol 2018 ◽  
pp. 1-17 ◽  
Author(s):  
Tomáš Bajtoš ◽  
Andrej Gajdoš ◽  
Lenka Kleinová ◽  
Katarína Lučivjanská ◽  
Pavol Sokol

With the increase in usage of computer systems and computer networks, the problem of intrusion detection in network security has become an important issue. In this paper, we discuss approaches that simplify network administrator’s work. We applied clustering methods for security incident profiling. We considerK-means, PAM, and CLARA clustering algorithms. For this purpose, we used data collected in Warden system from various security tools. We do not aim to differentiate between normal and abnormal network traffic, but we focus on grouping similar threat agents based on attributes of security events. We suggest a case of a fine classification and a case of a coarse classification and discuss advantages of both cases.


2015 ◽  
Vol 09 (03) ◽  
pp. 307-331 ◽  
Author(s):  
Wei Zhang ◽  
Gongxuan Zhang ◽  
Yongli Wang ◽  
Zhaomeng Zhu ◽  
Tao Li

Nearest neighbor search is a key technique used in hierarchical clustering and its computing complexity decides the performance of the hierarchical clustering algorithm. The time complexity of standard agglomerative hierarchical clustering is O(n3), while the time complexity of more advanced hierarchical clustering algorithms (such as nearest neighbor chain, SLINK and CLINK) is O(n2). This paper presents a new nearest neighbor search method called nearest neighbor boundary (NNB), which first divides a large dataset into independent subset and then finds nearest neighbor of each point in subset. When NNB is used, the time complexity of hierarchical clustering can be reduced to O(n log 2n). Based on NNB, we propose a fast hierarchical clustering algorithm called nearest-neighbor boundary clustering (NBC), and the proposed algorithm can be adapted to the parallel and distributed computing framework. The experimental results demonstrate that our algorithm is practical for large datasets.


2021 ◽  
Vol 2021 ◽  
pp. 1-22
Author(s):  
Xin Li ◽  
Peng Yi ◽  
Wei Wei ◽  
Yiming Jiang ◽  
Le Tian

As an important part of intrusion detection, feature selection plays a significant role in improving the performance of intrusion detection. Krill herd (KH) algorithm is an efficient swarm intelligence algorithm with excellent performance in data mining. To solve the problem of low efficiency and high false positive rate in intrusion detection caused by increasing high-dimensional data, an improved krill swarm algorithm based on linear nearest neighbor lasso step (LNNLS-KH) is proposed for feature selection of network intrusion detection. The number of selected features and classification accuracy are introduced into fitness evaluation function of LNNLS-KH algorithm, and the physical diffusion motion of the krill individuals is transformed by a nonlinear method. Meanwhile, the linear nearest neighbor lasso step optimization is performed on the updated krill herd position in order to derive the global optimal solution. Experiments show that the LNNLS-KH algorithm retains 7 features in NSL-KDD dataset and 10.2 features in CICIDS2017 dataset on average, which effectively eliminates redundant features while ensuring high detection accuracy. Compared with the CMPSO, ACO, KH, and IKH algorithms, it reduces features by 44%, 42.86%, 34.88%, and 24.32% in NSL-KDD dataset, and 57.85%, 52.34%, 27.14%, and 25% in CICIDS2017 dataset, respectively. The classification accuracy increased by 10.03% and 5.39%, and the detection rate increased by 8.63% and 5.45%. Time of intrusion detection decreased by 12.41% and 4.03% on average. Furthermore, LNNLS-KH algorithm quickly jumps out of the local optimal solution and shows good performance in the optimal fitness iteration curve, convergence speed, and false positive rate of detection.


Sign in / Sign up

Export Citation Format

Share Document