A Feature Selection Approach for Network Intrusion Classification

Feature selection is a preprocessing step to machine learning, leads to increase the classification accuracy and reduce its complexity. Feature selection methods are classified into two main categories: filter and wrapper. Filter methods evaluate features without involving any learning algorithm, while wrapper methods depend on a learning algorithm for feature evaluation. Variety hybrid Filter and wrapper methods have been proposed in the literature. However, hybrid filter and wrapper approaches suffer from the problem of determining the cut-off point of the ranked features. This leads to decrease the classification accuracy by eliminating important features. In this paper the authors proposed a Hybrid Bi-Layer behavioral-based feature selection approach, which combines filter and wrapper feature selection methods. The proposed approach solves the cut-off point problem for the ranked features. It consists of two layers, at the first layer Information gain is used to rank the features and select a new set of features depending on a global maxima classification accuracy. Then, at the second layer a new subset of features is selected from within the first layer redacted data set by searching for a group of local maximum classification accuracy. To evaluate the proposed approach it is applied on NSL-KDD dataset, where the number of features is reduced from 41 to 34 features at the first layer. Then reduced from 34 to 20 features at the second layer, which leads to improve the classification accuracy to 99.2%.

Download Full-text

A Feature Selection Approach for Network Intrusion Detection Based on Tree-Seed Algorithm and K-Nearest Neighbor

2018 IEEE 4th International Symposium on Wireless Systems within the International Conferences on Intelligent Data Acquisition and Advanced Computing Systems (IDAACS-SWS) ◽

10.1109/idaacs-sws.2018.8525522 ◽

2018 ◽

Cited By ~ 3

Author(s):

Feng Chen ◽

Zhiwei Ye ◽

Chunzhi Wang ◽

Lingyu Yan ◽

Ruoxi Wang

Keyword(s):

Feature Selection ◽

Intrusion Detection ◽

Nearest Neighbor ◽

Network Intrusion Detection ◽

K Nearest Neighbor ◽

Network Intrusion ◽

Selection Approach ◽

Feature Selection Approach ◽

Tree Seed

Download Full-text

Early prediction of chronic disease using an efficient machine learning algorithm through adaptive probabilistic divergence based feature selection approach

International Journal of Pervasive Computing and Communications ◽

10.1108/ijpcc-04-2020-0018 ◽

2020 ◽

Vol ahead-of-print (ahead-of-print) ◽

Cited By ~ 2

Author(s):

Sandeepkumar Hegde ◽

Monica R. Mundada

Keyword(s):

Machine Learning ◽

Feature Selection ◽

Chronic Disease ◽

Learning Algorithm ◽

Predictive Ability ◽

Experimental Result ◽

Data Set ◽

Content Type ◽

Learning Classifier ◽

Feature Selection Approach

Purpose According to the World Health Organization, by 2025, the contribution of chronic disease is expected to rise by 73% compared to all deaths and it is considered as global burden of disease with a rate of 60%. These diseases persist for a longer duration of time, which are almost incurable and can only be controlled. Cardiovascular disease, chronic kidney disease (CKD) and diabetes mellitus are considered as three major chronic diseases that will increase the risk among the adults, as they get older. CKD is considered a major disease among all these chronic diseases, which will increase the risk among the adults as they get older. Overall 10% of the population of the world is affected by CKD and it is likely to double in the year 2030. The paper aims to propose novel feature selection approach in combination with the machine-learning algorithm which can early predict the chronic disease with utmost accuracy. Hence, a novel feature selection adaptive probabilistic divergence-based feature selection (APDFS) algorithm is proposed in combination with the hyper-parameterized logistic regression model (HLRM) for the early prediction of chronic disease. Design/methodology/approach A novel feature selection APDFS algorithm is proposed which explicitly handles the feature associated with the class label by relevance and redundancy analysis. The algorithm applies the statistical divergence-based information theory to identify the relationship between the distant features of the chronic disease data set. The data set required to experiment is obtained from several medical labs and hospitals in India. The HLRM is used as a machine-learning classifier. The predictive ability of the framework is compared with the various algorithm and also with the various chronic disease data set. The experimental result illustrates that the proposed framework is efficient and achieved competitive results compared to the existing work in most of the cases. Findings The performance of the proposed framework is validated by using the metric such as recall, precision, F1 measure and ROC. The predictive performance of the proposed framework is analyzed by passing the data set belongs to various chronic disease such as CKD, diabetes and heart disease. The diagnostic ability of the proposed approach is demonstrated by comparing its result with existing algorithms. The experimental figures illustrated that the proposed framework performed exceptionally well in prior prediction of CKD disease with an accuracy of 91.6. Originality/value The capability of the machine learning algorithms depends on feature selection (FS) algorithms in identifying the relevant traits from the data set, which impact the predictive result. It is considered as a process of choosing the relevant features from the data set by removing redundant and irrelevant features. Although there are many approaches that have been already proposed toward this objective, they are computationally complex because of the strategy of following a one-step scheme in selecting the features. In this paper, a novel feature selection APDFS algorithm is proposed which explicitly handles the feature associated with the class label by relevance and redundancy analysis. The proposed algorithm handles the process of feature selection in two separate indices. Hence, the computational complexity of the algorithm is reduced to O(nk+1). The algorithm applies the statistical divergence-based information theory to identify the relationship between the distant features of the chronic disease data set. The data set required to experiment is obtained from several medical labs and hospitals of karkala taluk ,India. The HLRM is used as a machine learning classifier. The predictive ability of the framework is compared with the various algorithm and also with the various chronic disease data set. The experimental result illustrates that the proposed framework is efficient and achieved competitive results are compared to the existing work in most of the cases.

Download Full-text

An Effective Feature Selection Approach for Network Intrusion Detection

2013 IEEE Eighth International Conference on Networking, Architecture and Storage ◽

10.1109/nas.2013.49 ◽

2013 ◽

Cited By ~ 21

Author(s):

Fengli Zhang ◽

Dan Wang

Keyword(s):

Feature Selection ◽

Intrusion Detection ◽

Network Intrusion Detection ◽

Network Intrusion ◽

Selection Approach ◽

Feature Selection Approach

Download Full-text

SU-CCE: A Novel Feature Selection Approach for Reducing High Dimensionality

10.3233/apc210196 ◽

2021 ◽

Author(s):

A B Pawar ◽

M A Jawale ◽

Ravi Kumar Tirandasu ◽

Saiprasad Potharaju

Keyword(s):

Feature Selection ◽

Classification Accuracy ◽

Feature Space ◽

Microarray Dataset ◽

Classification Model ◽

High Dimensionality ◽

High Dimensional ◽

Selection Approach ◽

Feature Selection Approach ◽

Careful Investigation

High dimensionality is the serious issue in the preprocessing of data mining. Having large number of features in the dataset leads to several complications for classifying an unknown instance. In a initial dataspace there may be redundant and irrelevant features present, which leads to high memory consumption, and confuse the learning model created with those properties of features. Always it is advisable to select the best features and generate the classification model for better accuracy. In this research, we proposed a novel feature selection approach and Symmetrical uncertainty and Correlation Coefficient (SU-CCE) for reducing the high dimensional feature space and increasing the classification accuracy. The experiment is performed on colon cancer microarray dataset which has 2000 features. The proposed method derived 38 best features from it. To measure the strength of proposed method, top 38 features extracted by 4 traditional filter-based methods are compared with various classifiers. After careful investigation of result, the proposed approach is competing with most of the traditional methods.

Download Full-text

An Empirical Evaluation of Feature Selection Methods

Improving Knowledge Discovery through the Integration of Data Mining Techniques - Advances in Data Mining and Database Management ◽

10.4018/978-1-4666-8513-0.ch012 ◽

2015 ◽

pp. 233-258 ◽

Cited By ~ 1

Author(s):

Mohsin Iqbal ◽

Saif Ur Rehman ◽

Saira Gillani ◽

Sohail Asghar

Keyword(s):

Machine Learning ◽

Feature Selection ◽

Classification Accuracy ◽

Information Gain ◽

Learning Algorithm ◽

Empirical Evaluation ◽

Machine Learning Algorithms ◽

Selection Methods ◽

The One ◽

Processing And Storage

The key objective of the chapter would be to study the classification accuracy, using feature selection with machine learning algorithms. The dimensionality of the data is reduced by implementing Feature selection and accuracy of the learning algorithm improved. We test how an integrated feature selection could affect the accuracy of three classifiers by performing feature selection methods. The filter effects show that Information Gain (IG), Gain Ratio (GR) and Relief-f, and wrapper effect show that Bagging and Naive Bayes (NB), enabled the classifiers to give the highest escalation in classification accuracy about the average while reducing the volume of unnecessary attributes. The achieved conclusions can advise the machine learning users, which classifier and feature selection methods to use to optimize the classification accuracy, and this can be important, especially at risk-sensitive applying Machine Learning whereas in the one of the aim to reduce costs of collecting, processing and storage of unnecessary data.

Download Full-text

Federated Learning-Based Network Intrusion Detection with a Feature Selection Approach

10.1109/icecce52056.2021.9514222 ◽

2021 ◽

Author(s):

Yang Qin ◽

Masaaki Kondo

Keyword(s):

Feature Selection ◽

Intrusion Detection ◽

Network Intrusion Detection ◽

Network Intrusion ◽

Selection Approach ◽

Feature Selection Approach

Download Full-text

A feature selection approach to find optimal feature subsets for the network intrusion detection system

Cluster Computing ◽

10.1007/s10586-015-0527-8 ◽

2016 ◽

Vol 19 (1) ◽

pp. 325-333 ◽

Cited By ~ 26

Author(s):

Seung-Ho Kang ◽

Kuinam J. Kim

Keyword(s):

Feature Selection ◽

Intrusion Detection ◽

Intrusion Detection System ◽

Detection System ◽

Network Intrusion Detection ◽

Network Intrusion ◽

Network Intrusion Detection System ◽

Selection Approach ◽

Feature Selection Approach ◽

Optimal Feature

Download Full-text

A Network Intrusion Detection System Based On Ensemble CVM Using Efficient Feature Selection Approach

Procedia Computer Science ◽

10.1016/j.procs.2018.10.416 ◽

2018 ◽

Vol 143 ◽

pp. 442-449 ◽

Cited By ~ 5

Author(s):

T.H. Divyasree ◽

K.K. Sherly

Keyword(s):

Feature Selection ◽

Intrusion Detection ◽

Intrusion Detection System ◽

Detection System ◽

Network Intrusion Detection ◽

Network Intrusion ◽

Network Intrusion Detection System ◽

Selection Approach ◽

Feature Selection Approach

Download Full-text

An Efficient Malware Detection System using Hybrid Feature Selection Methods

International Journal of Engineering and Advanced Technology - Regular Issue ◽

10.35940/ijeat.a1043.1291s319 ◽

2019 ◽

Vol 9 (1S3) ◽

pp. 224-228

Keyword(s):

Machine Learning ◽

Feature Selection ◽

Unique Feature ◽

Detection System ◽

Malware Detection ◽

Selection Methods ◽

Training Process ◽

Detection Systems ◽

Selection Approach ◽

Feature Selection Approach

Malware is a serious threat to individuals and users. The security researchers present various solutions, striving to achieve efficient malware detection. Malware attackers devise detection avoidance techniques to escape from detection systems. The key challenge is that growth of malware increases every hour, leading to large damages to users’ privacy. The training process takes much longer time, mining the unnecessary features. Feature Selection is effective in achieving unique feature set in detecting malware. In this paper, we propose a malware detection system using hybrid feature selection approach to detect malware efficiently with a reduced feature set. Machine learning based classification is performed on eight classifiers with two malware datasets. The experiments were done without and with feature selection. The empirical results show that the classification using selected feature set and XGB classifier identifies malware efficiently with an accuracy of 98.9% and 99.26% for the two datasets.

Download Full-text

Hybrid Feature Selection Approach to Improve the Deep Neural Network on New Flow-Based Dataset for NIDS

Wasit Journal of Computer and Mathematics Science ◽

10.31185/wjcm.vol1.iss1.10 ◽

2021 ◽

Vol 1 (1) ◽

pp. 66-83

Author(s):

Rawaa Ismael Farhan ◽

Abeer Tariq Maolood ◽

NidaaFlaih Hassan

Keyword(s):

Feature Selection ◽

Detection System ◽

Machine Learning Algorithms ◽

Binary Particle Swarm Optimization ◽

Feature Subset ◽

Network Intrusion ◽

Selection Approach ◽

Long Time ◽

Feature Selection Approach ◽

Computational Resources

Network Intrusion Detection System (NIDS) detects normal and malicious behavior by analyzing network traffic, this analysis has the potential to detect novel attacks especially in IoT environments. Deep Learning (DL)has proven its outperformance compared to machine learning algorithms in solving the complex problems of the real-world like NIDS. Although, this approach needs more computational resources and consumes a long time. Feature selection plays a significant role in choosing the best features only that describe the target concept optimally during a classification process. However, when handling a large number of features the selecting such relevant features becomes a difficult task. Therefore, this paper proposes Enhanced BPSO using Binary Particle Swarm Optimization (BPSO) and correlation–based (CFS) classical statistical feature selection approach to solve the problem on BPSO feature selection. The selected feature subset has evaluated on Deep Neural Networks (DNN) classifiers and the new flow-based CSE-CIC-IDS2018 dataset. Experimental results have shown a high accuracy of 95% based on processing time, detection rate, and false alarm rate compared with other benchmark classifiers.

Download Full-text