A new proactive feature selection model based on the enhanced optimization algorithms to detect DRDoS attacks

Riyadh Rahef Nuiaa; Selvakumar Manickam; Ali Hakem Alsaeedi; Esraa Saleh Alomari

doi:10.11591/ijece.v12i2.pp1869-1880

A new proactive feature selection model based on the enhanced optimization algorithms to detect DRDoS attacks

International Journal of Electrical and Computer Engineering (IJECE) ◽

10.11591/ijece.v12i2.pp1869-1880 ◽

2022 ◽

Vol 12 (2) ◽

pp. 1869

Author(s):

Riyadh Rahef Nuiaa ◽

Selvakumar Manickam ◽

Ali Hakem Alsaeedi ◽

Esraa Saleh Alomari

Keyword(s):

Feature Selection ◽

Detection System ◽

False Positive Rate ◽

Denial Of Service ◽

Selection Model ◽

Machine Learning Algorithms ◽

Feature Subset Selection ◽

Support Vector ◽

Feature Subset ◽

New Variant

Cyberattacks have grown steadily over the last few years. The distributed reflection denial of service (DRDoS) attack has been rising, a new variant of distributed denial of service (DDoS) attack. DRDoS attacks are more difficult to mitigate due to the dynamics and the attack strategy of this type of attack. The number of features influences the performance of the intrusion detection system by investigating the behavior of traffic. Therefore, the feature selection model improves the accuracy of the detection mechanism also reduces the time of detection by reducing the number of features. The proposed model aims to detect DRDoS attacks based on the feature selection model, and this model is called a proactive feature selection model proactive feature selection (PFS). This model uses a nature-inspired optimization algorithm for the feature subset selection. Three machine learning algorithms, i.e., k-nearest neighbor (KNN), random forest (RF), and support vector machine (SVM), were evaluated as the potential classifier for evaluating the selected features. We have used the CICDDoS2019 dataset for evaluation purposes. The performance of each classifier is compared to previous models. The results indicate that the suggested model works better than the current approaches providing a higher detection rate (DR), a low false-positive rate (FPR), <span>and increased accuracy detection (DA).</span> The PFS model shows better accuracy to detect DRDoS attacks with 89.59%.

Download Full-text

A Hybrid Feature Selection Method for Improve the Accuracy of Medical Classification Process

International Journal of Innovative Technology and Exploring Engineering - Special Issue ◽

10.35940/ijitee.a9624.1111121 ◽

2021 ◽

Vol 11 (1) ◽

pp. 50-55

Author(s):

Maria Mohammad Yousef ◽

Keyword(s):

Machine Learning ◽

Feature Selection ◽

Dimensionality Reduction ◽

Classification Accuracy ◽

Fitness Function ◽

Machine Learning Algorithms ◽

Feature Subset Selection ◽

High Dimensionality ◽

Support Vector ◽

Feature Subset

Generally, medical dataset classification has become one of the biggest problems in data mining research. Every database has a given number of features but it is observed that some of these features can be redundant and can be harmful as well as disrupt the process of classification and this problem is known as a high dimensionality problem. Dimensionality reduction in data preprocessing is critical for increasing the performance of machine learning algorithms. Besides the contribution of feature subset selection in dimensionality reduction gives a significant improvement in classification accuracy. In this paper, we proposed a new hybrid feature selection approach based on (GA assisted by KNN) to deal with issues of high dimensionality in biomedical data classification. The proposed method first applies the combination between GA and KNN for feature selection to find the optimal subset of features where the classification accuracy of the k-Nearest Neighbor (kNN) method is used as the fitness function for GA. After selecting the best-suggested subset of features, Support Vector Machine (SVM) are used as the classifiers. The proposed method experiments on five medical datasets of the UCI Machine Learning Repository. It is noted that the suggested technique performs admirably on these databases, achieving higher classification accuracy while using fewer features.

Download Full-text

FAST FEATURE SUBSET SELECTION IN BIOLOGICAL SEQUENCE ANALYSIS

International Journal of Pattern Recognition and Artificial Intelligence ◽

10.1142/s0218001409007107 ◽

2009 ◽

Vol 23 (02) ◽

pp. 191-207 ◽

Cited By ~ 2

Author(s):

RAINER PUDIMAT ◽

ROLF BACKOFEN ◽

ERNST G. SCHUKAT-TALAMAZZINI

Keyword(s):

Feature Selection ◽

Search Algorithms ◽

Machine Learning Algorithms ◽

Feature Subset Selection ◽

Biological Research ◽

Selection Strategy ◽

Support Vector ◽

Feature Subset ◽

Biological Sequence ◽

Biological Sequence Analysis

Biological research produces a wealth of measured data. Neither it is easy for biologists to postulate hypotheses about the behavior or structure of the observed entity because the relevant properties measured are not seen in the ocean of measurements. Nor is it easy to design machine learning algorithms to classify or cluster the data items for the same reason. Algorithms for automatically selecting a highly predictive subset of the measured features can help to overcome these difficulties. We present an efficient feature selection strategy which can be applied to arbitrary feature selection problems. The core technique is a new method for estimating the quality of subsets from previously calculated qualities for smaller subsets by minimizing the mean standard error of estimated values with an approach common to support vector machines. This method can be integrated in many feature subset search algorithms. We have applied it with sequential search algorithms and have been able to reduce the number of quality calculations for finding accurate feature subsets by about 70%. We show these improvements by applying our approach to the problem of finding highly predictive feature subsets for transcription factor binding sites.

Download Full-text

Sentiment Analysis Using Hybrid Feature Selection Techniques

UHD Journal of Science and Technology ◽

10.21928/uhdjst.v4n1y2020.pp29-40 ◽

2020 ◽

Vol 4 (1) ◽

pp. 29

Author(s):

Sasan Sarbast Abdulkhaliq ◽

Aso Mohammad Darwesh

Keyword(s):

Machine Learning ◽

Social Media ◽

Feature Selection ◽

Classification Accuracy ◽

Feature Selection Method ◽

Selection Method ◽

Machine Learning Algorithms ◽

Feature Subset Selection ◽

Support Vector ◽

Feature Subset

Nowadays, people from every part of the world use social media and social networks to express their feelings toward different topics and aspects. One of the trendiest social media is Twitter, which is a microblogging website that provides a platform for its users to share their views and feelings about products, services, events, etc., in public. Which makes Twitter one of the most valuable sources for collecting and analyzing data by researchers and developers to reveal people sentiment about different topics and services, such as products of commercial companies, services, well-known people such as politicians and athletes, through classifying those sentiments into positive and negative. Classification of people sentiment could be automated through using machine learning algorithms and could be enhanced through using appropriate feature selection methods. We collected most recent tweets about (Amazon, Trump, Chelsea FC, CR7) using Twitter-Application Programming Interface and assigned sentiment score using lexicon rule-based approach, then proposed a machine learning model to improve classification accuracy through using hybrid feature selection method, namely, filter-based feature selection method Chi-square (Chi-2) plus wrapper-based binary coordinate ascent (Chi-2 + BCA) to select optimal subset of features from term frequency-inverse document frequency (TF-IDF) generated features for classification through support vector machine (SVM), and Bag of words generated features for logistic regression (LR) classifiers using different n-gram ranges. After comparing the hybrid (Chi-2+BCA) method with (Chi-2) selected features, and also with the classifiers without feature subset selection, results show that the hybrid feature selection method increases classification accuracy in all cases. The maximum attained accuracy with LR is 86.55% using (1 + 2 + 3-g) range, with SVM is 85.575% using the unigram range, both in the CR7 dataset.

Download Full-text

Improved TLBO-JAYA Algorithm for Subset Feature Selection and Parameter Optimisation in Intrusion Detection System

Complexity ◽

10.1155/2020/5287684 ◽

2020 ◽

Vol 2020 ◽

pp. 1-18 ◽

Cited By ~ 1

Author(s):

Mohammad Aljanabi ◽

Mohd Arfian Ismail ◽

Vitaly Mezhuyev

Keyword(s):

Support Vector Machine ◽

Feature Selection ◽

Intrusion Detection ◽

Intrusion Detection System ◽

Detection System ◽

Parameter Tuning ◽

Feature Subset Selection ◽

Supervised Machine Learning ◽

Support Vector ◽

Feature Subset

Many optimisation-based intrusion detection algorithms have been developed and are widely used for intrusion identification. This condition is attributed to the increasing number of audit data features and the decreasing performance of human-based smart intrusion detection systems regarding classification accuracy, false alarm rate, and classification time. Feature selection and classifier parameter tuning are important factors that affect the performance of any intrusion detection system. In this paper, an improved intrusion detection algorithm for multiclass classification was presented and discussed in detail. The proposed method combined the improved teaching-learning-based optimisation (ITLBO) algorithm, improved parallel JAYA (IPJAYA) algorithm, and support vector machine. ITLBO with supervised machine learning (ML) technique was used for feature subset selection (FSS). The selection of the least number of features without causing an effect on the result accuracy in FSS is a multiobjective optimisation problem. This work proposes ITLBO as an FSS mechanism, and its algorithm-specific, parameterless concept (no parameter tuning is required during optimisation) was explored. IPJAYA in this study was used to update the C and gamma parameters of the support vector machine (SVM). Several experiments were performed on the prominent intrusion ML dataset, where significant enhancements were observed with the suggested ITLBO-IPJAYA-SVM algorithm compared with the classical TLBO and JAYA algorithms.

Download Full-text

A novel feature selection algorithm based on damping oscillation theory

PLoS ONE ◽

10.1371/journal.pone.0255307 ◽

2021 ◽

Vol 16 (8) ◽

pp. e0255307

Author(s):

Fujun Wang ◽

Xing Wang

Keyword(s):

Feature Selection ◽

Optimization Algorithm ◽

Euclidean Distance ◽

Oscillation Theory ◽

Feature Subset Selection ◽

Support Vector ◽

Data Sets ◽

Feature Subset ◽

Selection Algorithm ◽

Filter Model

Feature selection is an important task in big data analysis and information retrieval processing. It reduces the number of features by removing noise, extraneous data. In this paper, one feature subset selection algorithm based on damping oscillation theory and support vector machine classifier is proposed. This algorithm is called the Maximum Kendall coefficient Maximum Euclidean Distance Improved Gray Wolf Optimization algorithm (MKMDIGWO). In MKMDIGWO, first, a filter model based on Kendall coefficient and Euclidean distance is proposed, which is used to measure the correlation and redundancy of the candidate feature subset. Second, the wrapper model is an improved grey wolf optimization algorithm, in which its position update formula has been improved in order to achieve optimal results. Third, the filter model and the wrapper model are dynamically adjusted by the damping oscillation theory to achieve the effect of finding an optimal feature subset. Therefore, MKMDIGWO achieves both the efficiency of the filter model and the high precision of the wrapper model. Experimental results on five UCI public data sets and two microarray data sets have demonstrated the higher classification accuracy of the MKMDIGWO algorithm than that of other four state-of-the-art algorithms. The maximum ACC value of the MKMDIGWO algorithm is at least 0.5% higher than other algorithms on 10 data sets.

Download Full-text

Scrutinizing Attacks and Evaluating Performance Appraisal Parameters via Feature Selection in Intrusion Detection System

10.21203/rs.3.rs-748765/v1 ◽

2021 ◽

Author(s):

Navroop Kaur ◽

Meenakshi Bansal ◽

Sukhwinder Singh S

Keyword(s):

Feature Selection ◽

Performance Evaluation ◽

Intrusion Detection ◽

Intrusion Detection System ◽

Detection System ◽

Denial Of Service ◽

Cyber Attacks ◽

Support Vector ◽

K Nearest Neighbor ◽

Evaluation Parameters

Abstract In modern times the firewall and antivirus packages are not good enough to protect the organization from numerous cyber attacks. Computer IDS (Intrusion Detection System) is a crucial aspect that contributes to the success of an organization. IDS is a software application responsible for scanning organization networks for suspicious activities and policy rupturing. IDS ensures the secure and reliable functioning of the network within an organization. IDS underwent huge transformations since its origin to cope up with the advancing computer crimes. The primary motive of IDS has been to augment the competence of detecting the attacks without endangering the performance of the network. The research paper elaborates on different types and different functions performed by the IDS. The NSL KDD dataset has been considered for training and testing. The seven prominent classifiers LR (Logistic Regression), NB (Naïve Bayes), DT (Decision Tree), AB (AdaBoost), RF (Random Forest), kNN (k Nearest Neighbor), and SVM (Support Vector Machine) have been studied along with their pros and cons and the feature selection have been imposed to enhance the reading of performance evaluation parameters (Accuracy, Precision, Recall, and F1Score). The paper elaborates a detailed flowchart and algorithm depicting the procedure to perform feature selection using XGB (Extreme Gradient Booster) for four categories of attacks: DoS (Denial of Service), Probe, R2L (Remote to Local Attack), and U2R (User to Root Attack). The selected features have been ranked as per their occurrence. The implementation have been conducted at five different ratios of 60-40%, 70-30%, 90-10%, 50-50%, and 80-20%. Different classifiers scored best for different performance evaluation parameters at different ratios. NB scored with the best Accuracy and Recall values. DT and RF consistently performed with high accuracy. NB, SVM, and kNN achieved good F1Score.

Download Full-text

A New Hybrid Feature Subset Selection Framework Based on Binary Genetic Algorithm and Information Theory

International Journal of Computational Intelligence and Applications ◽

10.1142/s1469026819500202 ◽

2019 ◽

Vol 18 (03) ◽

pp. 1950020 ◽

Cited By ~ 13

Author(s):

Alok Kumar Shukla ◽

Pradeep Singh ◽

Manu Vardhan

Keyword(s):

Genetic Algorithm ◽

Feature Selection ◽

Classification Accuracy ◽

B Cell Lymphoma ◽

Feature Subset Selection ◽

Classification Model ◽

Significant Feature ◽

Support Vector ◽

Feature Subset ◽

Binary Genetic Algorithm

The explosion of the high-dimensional dataset in the scientific repository has been encouraging interdisciplinary research on data mining, pattern recognition and bioinformatics. The fundamental problem of the individual Feature Selection (FS) method is extracting informative features for classification model and to seek for the malignant disease at low computational cost. In addition, existing FS approaches overlook the fact that for a given cardinality, there can be several subsets with similar information. This paper introduces a novel hybrid FS algorithm, called Filter-Wrapper Feature Selection (FWFS) for a classification problem and also addresses the limitations of existing methods. In the proposed model, the front-end filter ranking method as Conditional Mutual Information Maximization (CMIM) selects the high ranked feature subset while the succeeding method as Binary Genetic Algorithm (BGA) accelerates the search in identifying the significant feature subsets. One of the merits of the proposed method is that, unlike an exhaustive method, it speeds up the FS procedure without lancing of classification accuracy on reduced dataset when a learning model is applied to the selected subsets of features. The efficacy of the proposed (FWFS) method is examined by Naive Bayes (NB) classifier which works as a fitness function. The effectiveness of the selected feature subset is evaluated using numerous classifiers on five biological datasets and five UCI datasets of a varied dimensionality and number of instances. The experimental results emphasize that the proposed method provides additional support to the significant reduction of the features and outperforms the existing methods. For microarray data-sets, we found the lowest classification accuracy is 61.24% on SRBCT dataset and highest accuracy is 99.32% on Diffuse large B-cell lymphoma (DLBCL). In UCI datasets, the lowest classification accuracy is 40.04% on the Lymphography using k-nearest neighbor (k-NN) and highest classification accuracy is 99.05% on the ionosphere using support vector machine (SVM).

Download Full-text

Intrusion Detection System using SMIFS and Multi class Multi layer Perceptron

International Journal of Innovative Technology and Exploring Engineering - Special Issue ◽

10.35940/ijitee.i8982.078919 ◽

2019 ◽

Vol 8 (9) ◽

pp. 2622-2628

Keyword(s):

Feature Extraction ◽

Feature Selection ◽

Mutual Information ◽

New Technologies ◽

Detection System ◽

Feature Selection Method ◽

Machine Learning Algorithms ◽

Feature Subset ◽

Classification Problems ◽

Data Set

As the new technologies are emerging, data is getting generated in larger volumes high dimensions. The high dimensionality of data may rise to great challenge while classification. The presence of redundant features and noisy data degrades the performance of the model. So, it is necessary to extract the relevant features from given data set. Feature extraction is an important step in many machine learning algorithms. Many researchers have been attempted to extract the features. Among these different feature extraction methods, mutual information is widely used feature selection method because of its good quality of quantifying dependency among the features in classification problems. To cope with this issue, in this paper we proposed simplified mutual information based feature selection with less computational overhead. The selected feature subset is experimented with multilayered perceptron on KDD CUP 99 data set with 2- class classification, 5-class classification and 4-class classification. The accuracy is of these models almost similar with less number of features.

Download Full-text

Optimal Feature Subset Selection for Imbalanced Class Data using SMOTE and Binary ALO Algorithm

International Journal of Engineering and Advanced Technology - Regular Issue ◽

10.35940/ijeat.c4734.029320 ◽

2020 ◽

Vol 9 (3) ◽

pp. 344-349

Keyword(s):

Feature Selection ◽

Class Imbalance ◽

Classification Performance ◽

Selection Model ◽

Feature Subset Selection ◽

Feature Subset ◽

Spatial Features ◽

Imbalanced Classes ◽

Optimal Feature Subset ◽

Optimal Feature

Feature selection in multispectral high dimensional information is a hard labour machine learning problem because of the imbalanced classes present in the data. The existing Most of the feature selection schemes in the literature ignore the problem of class imbalance by choosing the features from the classes having more instances and avoiding significant features of the classes having less instances. In this paper, SMOTE concept is exploited to produce the required samples form minority classes. Feature selection model is formulated with the objective of reducing number of features with improved classification performance. This model is based on dimensionality reduction by opt for a subset of relevant spectral, textural and spatial features while eliminating the redundant features for the purpose of improved classification performance. Binary ALO is engaged to solve the feature selection model for optimal selection of features. The proposed ALO-SVM with wrapper concept is applied to each potential solution obtained during optimization step. The working of this methodology is tested on LANDSAT multispectral image.

Download Full-text

An ensemble feature selection approach using hybrid kernel based SVM for network intrusion detection system

Indonesian Journal of Electrical Engineering and Computer Science ◽

10.11591/ijeecs.v23.i1.pp558-565 ◽

2021 ◽

Vol 23 (1) ◽

pp. 558

Author(s):

Gaddam Venu Gopal ◽

Gatram Rama Mohan Babu

Keyword(s):

Feature Selection ◽

Intrusion Detection ◽

Intrusion Detection System ◽

Detection System ◽

Network Intrusion Detection ◽

Support Vector ◽

Feature Subset ◽

Network Intrusion ◽

Feature Selection Approach ◽

Hybrid Kernel

Feature selection is a process of identifying relevant feature subset that leads to the machine learning algorithm in a well-defined manner. In this paper, anovel ensemble feature selection approach that comprises of Relief Attribute Evaluation and hybrid kernel-based support vector machine (HK-SVM) approach is proposed as a feature selection method for network intrusion detection system (NIDS). A Hybrid approach along with the combination of Gaussian and Polynomial methods is used as a kernel for support vector machine (SVM). The key issue is to select a feature subset that yields good accuracy at a minimal computational cost. The proposed approach is implemented and compared with classical SVM and simple kernel. Kyoto2006+, a bench mark intrusion detection dataset,is used for experimental evaluation and then observations are drawn.

Download Full-text