A Comprehensive Data Sampling Analysis Applied to the Classification of Rare IoT Network Intrusion Types

Author(s):  
Suchet Sapre ◽  
Khondkar Islam ◽  
Pouyan Ahmadi
2014 ◽  
Vol 643 ◽  
pp. 99-104
Author(s):  
Jin Yang ◽  
Yun Jie Li ◽  
Qin Li

In this paper, the process of the developments and changes of the network intrusion behaviors were analyzed. An improved epidemic spreading model was proposed to study the mechanisms of aggressive behaviors spreading, to predict the future course of an outbreak and to evaluate strategies to control a network epidemic. Based on Artificial Immune Systems, the concepts and formal definitions of immune cells were given. And in this paper, the forecasting algorithm based on Markov chain theory was proposed to improve the precision of network risk forecasting. The data of the Memory cells were analyzed directly and kinds of state-spaces were formed, which can be used to predict the risk of network situation by analyzing the cells status and the classification of optimal state. Experimental results show that the proposed model has the features of real-time processing for network situation awareness.


Author(s):  
Preethi D. ◽  
Neelu Khare

This chapter presents an ensemble-based feature selection with long short-term memory (LSTM) model. A deep recurrent learning model is proposed for classifying network intrusion. This model uses ensemble-based feature selection (EFS) for selecting the appropriate features from the dataset and long short-term memory for the classification of network intrusions. The EFS combines five feature selection techniques, namely information gain, gain ratio, chi-square, correlation-based feature selection, and symmetric uncertainty-based feature selection. The experiments were conducted using the standard benchmark NSL-KDD dataset and implemented using tensor flow and python. The proposed model is evaluated using the classification performance metrics and also compared with all the 41 features without any feature selection as well as with each individual feature selection technique and classified using LSTM. The performance study showed that the proposed model performs better, with 99.8% accuracy, with a higher detection and lower false alarm rates.


2020 ◽  
Vol 8 (2) ◽  
pp. 89-93 ◽  
Author(s):  
Hairani Hairani ◽  
Khurniawan Eko Saputro ◽  
Sofiansyah Fadli

The occurrence of imbalanced class in a dataset causes the classification results to tend to the class with the largest amount of data (majority class). A sampling method is needed to balance the minority class (positive class) so that the class distribution becomes balanced and leading to better classification results. This study was conducted to overcome imbalanced class problems on the Indian Pima diabetes illness dataset using k-means-SMOTE. The dataset has 268 instances of the positive class (minority class) and 500 instances of the negative class (majority class). The classification was done by comparing C4.5, SVM, and naïve Bayes while implementing k-means-SMOTE in data sampling. Using k-means-SMOTE, the SVM classification method has the highest accuracy and sensitivity of 82 % and 77 % respectively, while the naive Bayes method produces the highest specificity of 89 %.


2016 ◽  
Vol 2016 ◽  
pp. 1-6 ◽  
Author(s):  
Shanxiong Chen ◽  
Maoling Peng ◽  
Hailing Xiong ◽  
Xianping Yu

Intrusion detection needs to deal with a large amount of data; particularly, the technology of network intrusion detection has to detect all of network data. Massive data processing is the bottleneck of network software and hardware equipment in intrusion detection. If we can reduce the data dimension in the stage of data sampling and directly obtain the feature information of network data, efficiency of detection can be improved greatly. In the paper, we present a SVM intrusion detection model based on compressive sampling. We use compressed sampling method in the compressed sensing theory to implement feature compression for network data flow so that we can gain refined sparse representation. After that SVM is used to classify the compression results. This method can realize detection of network anomaly behavior quickly without reducing the classification accuracy.


2016 ◽  
Vol 3 (2) ◽  
pp. 139-148
Author(s):  
M Rizky Wijaya ◽  
Ristu Saptono ◽  
Afrizal Doewes

Diabetes can lead to mortality and disability, so patients should be inpatient again to undergo treatment again to be saved. On previous research about feature selection with greedy stepwise forward fail to predict classification ratio inpatient of patient with the result of recall and precision 0 on data training 60%, 75%, 80%, and 90% and there is suggestion to handle unbalanced class data problem by comparison of data readmitted 6293 and the otherwise 64141. The research purposed to know the effect of choosing the best model using best first instead of greedy stepwise forward and data sampling with spreadsubsample to resolve unbalanced class data problem. The data used was patient data from 130 American Hospital in 1999 until 2008 with 70434 data. The method that used was best first search and spreadsubsample. The result of this research are precision found 0.4 and 0.333 on training dataset 75% and 90% with best first method, while spreadsubsample method found that value of precision and recall is more significantly increased. Spreadsubsample has more effect with the result of precision and recall rather than using best first method.


Sign in / Sign up

Export Citation Format

Share Document