Evaluation of feature selection techniques on network traffic for comparing model accuracy

Anchit Bijalwan; Amit Awasthi; Prabhjot Kaur

doi:10.1504/ijcse.2021.10035715

Evaluation of feature selection techniques on network traffic for comparing model accuracy

International Journal of Computational Science and Engineering ◽

10.1504/ijcse.2021.115654 ◽

2021 ◽

Vol 24 (3) ◽

pp. 228

Author(s):

Prabhjot Kaur ◽

Amit Awasthi ◽

Anchit Bijalwan

Keyword(s):

Feature Selection ◽

Network Traffic ◽

Model Accuracy ◽

Feature Selection Techniques

Download Full-text

Automated Feature Selection for Anomaly Detection in Network Traffic Data

ACM Transactions on Management Information Systems ◽

10.1145/3446636 ◽

2021 ◽

Vol 12 (3) ◽

pp. 1-28

Author(s):

Makiya Nakashima ◽

Alex Sim ◽

Youngsoo Kim ◽

Jonghyun Kim ◽

Jinoh Kim

Keyword(s):

Feature Selection ◽

Anomaly Detection ◽

Network Traffic ◽

Selection Process ◽

Traffic Data ◽

Ensemble Techniques ◽

Building Models ◽

Comparable Performance ◽

Network Anomaly Detection ◽

Feature Selection Techniques

Variable selection (also known as feature selection ) is essential to optimize the learning complexity by prioritizing features, particularly for a massive, high-dimensional dataset like network traffic data. In reality, however, it is not an easy task to effectively perform the feature selection despite the availability of the existing selection techniques. From our initial experiments, we observed that the existing selection techniques produce different sets of features even under the same condition (e.g., a static size for the resulted set). In addition, individual selection techniques perform inconsistently, sometimes showing better performance but sometimes worse than others, thereby simply relying on one of them would be risky for building models using the selected features. More critically, it is demanding to automate the selection process, since it requires laborious efforts with intensive analysis by a group of experts otherwise. In this article, we explore challenges in the automated feature selection with the application of network anomaly detection. We first present our ensemble approach that benefits from the existing feature selection techniques by incorporating them, and one of the proposed ensemble techniques based on greedy search works highly consistently showing comparable results to the existing techniques. We also address the problem of when to stop to finalize the feature elimination process and present a set of methods designed to determine the number of features for the reduced feature set. Our experimental results conducted with two recent network datasets show that the identified feature sets by the presented ensemble and stopping methods consistently yield comparable performance with a smaller number of features to conventional selection techniques.

Download Full-text

Analysis of Feature Selection Techniques for Network Traffic Dataset

2013 International Conference on Machine Intelligence and Research Advancement ◽

10.1109/icmira.2013.15 ◽

2013 ◽

Cited By ~ 5

Author(s):

Raman Singh ◽

Harish Kumar ◽

R.K. Singla

Keyword(s):

Feature Selection ◽

Network Traffic ◽

Feature Selection Techniques

Download Full-text

On the value of filter feature selection techniques in homogeneous ensembles effort estimation

Journal of Software Evolution and Process ◽

10.1002/smr.2343 ◽

2021 ◽

Author(s):

Mohamed Hosni ◽

Ali Idri ◽

Alain Abran

Keyword(s):

Feature Selection ◽

Effort Estimation ◽

Feature Selection Techniques

Download Full-text

Arabic Named Entity Recognition on Social Media based on feature selection techniques usi ng SVM-RFE

2020 Fourth International Conference On Intelligent Computing in Data Sciences (ICDS) ◽

10.1109/icds50568.2020.9268762 ◽

2020 ◽

Author(s):

Brahim AIT BEN ALI ◽

Soukaina MIHI ◽

Ismail EL BAZI ◽

Nabil LAACHFOUBI

Keyword(s):

Social Media ◽

Feature Selection ◽

Named Entity Recognition ◽

Entity Recognition ◽

Named Entity ◽

Feature Selection Techniques

Download Full-text

A lazy feature selection method for multi-label classification

Intelligent Data Analysis ◽

10.3233/ida-194878 ◽

2021 ◽

Vol 25 (1) ◽

pp. 21-34

Author(s):

Rafael B. Pereira ◽

Alexandre Plastino ◽

Bianca Zadrozny ◽

Luiz H.C. Merschmann

Keyword(s):

Feature Selection ◽

Text Categorization ◽

Feature Selection Method ◽

Selection Method ◽

Video Classification ◽

Classification Problems ◽

Class Label ◽

New Feature ◽

Feature Selection Techniques ◽

Biomolecular Analysis

In many important application domains, such as text categorization, biomolecular analysis, scene or video classification and medical diagnosis, instances are naturally associated with more than one class label, giving rise to multi-label classification problems. This has led, in recent years, to a substantial amount of research in multi-label classification. More specifically, feature selection methods have been developed to allow the identification of relevant and informative features for multi-label classification. This work presents a new feature selection method based on the lazy feature selection paradigm and specific for the multi-label context. Experimental results show that the proposed technique is competitive when compared to multi-label feature selection techniques currently used in the literature, and is clearly more scalable, in a scenario where there is an increasing amount of data.

Download Full-text

Children’s Activity Classification for Domestic Risk Scenarios Using Environmental Sound and a Bayesian Network

Healthcare ◽

10.3390/healthcare9070884 ◽

2021 ◽

Vol 9 (7) ◽

pp. 884

Author(s):

Antonio García-Domínguez ◽

Carlos E. Galván-Tejada ◽

Ramón F. Brena ◽

Antonio A. Aguileta ◽

Jorge I. Galván-Tejada ◽

...

Keyword(s):

Feature Selection ◽

Naive Bayes ◽

Naïve Bayes ◽

Classification Model ◽

Activity Classification ◽

Environmental Sound ◽

Non Invasive ◽

Akaike Criterion ◽

Data Source ◽

Feature Selection Techniques

Children’s healthcare is a relevant issue, especially the prevention of domestic accidents, since it has even been defined as a global health problem. Children’s activity classification generally uses sensors embedded in children’s clothing, which can lead to erroneous measurements for possible damage or mishandling. Having a non-invasive data source for a children’s activity classification model provides reliability to the monitoring system where it is applied. This work proposes the use of environmental sound as a data source for the generation of children’s activity classification models, implementing feature selection methods and classification techniques based on Bayesian networks, focused on the recognition of potentially triggering activities of domestic accidents, applicable in child monitoring systems. Two feature selection techniques were used: the Akaike criterion and genetic algorithms. Likewise, models were generated using three classifiers: naive Bayes, semi-naive Bayes and tree-augmented naive Bayes. The generated models, combining the methods of feature selection and the classifiers used, present accuracy of greater than 97% for most of them, with which we can conclude the efficiency of the proposal of the present work in the recognition of potentially detonating activities of domestic accidents.

Download Full-text

Effective combining of feature selection techniques for machine learning-enabled IoT intrusion detection

Multimedia Tools and Applications ◽

10.1007/s11042-021-10567-y ◽

2021 ◽

Author(s):

Md Arafatur Rahman ◽

A. Taufiq Asyhari ◽

Ong Wei Wen ◽

Husnul Ajra ◽

Yussuf Ahmed ◽

...

Keyword(s):

Machine Learning ◽

Feature Selection ◽

Intrusion Detection ◽

Feature Selection Techniques

Download Full-text

Investigating the performance of the supervised learning algorithms for estimating NPPs parameters in combination with the different feature selection techniques

Annals of Nuclear Energy ◽

10.1016/j.anucene.2021.108299 ◽

2021 ◽

Vol 158 ◽

pp. 108299

Author(s):

Khalil Moshkbar-Bakhshayesh

Keyword(s):

Feature Selection ◽

Supervised Learning ◽

Learning Algorithms ◽

Supervised Learning Algorithms ◽

Feature Selection Techniques

Download Full-text

Feature-Selection and Mutual-Clustering Approaches to Improve DoS Detection and Maintain WSNs’ Lifetime

Sensors ◽

10.3390/s21144821 ◽

2021 ◽

Vol 21 (14) ◽

pp. 4821

Author(s):

Rami Ahmad ◽

Raniyah Wazirali ◽

Qusay Bsoul ◽

Tarik Abu-Ain ◽

Waleed Abu-Ain

Keyword(s):

Machine Learning ◽

Feature Selection ◽

Open Field ◽

Network Lifetime ◽

Detection Efficiency ◽

Denial Of Service ◽

Harmony Search ◽

Machine Learning Algorithms ◽

Transport Layer ◽

Feature Selection Techniques

Wireless Sensor Networks (WSNs) continue to face two major challenges: energy and security. As a consequence, one of the WSN-related security tasks is to protect them from Denial of Service (DoS) and Distributed DoS (DDoS) attacks. Machine learning-based systems are the only viable option for these types of attacks, as traditional packet deep scan systems depend on open field inspection in transport layer security packets and the open field encryption trend. Moreover, network data traffic will become more complex due to increases in the amount of data transmitted between WSN nodes as a result of increasing usage in the future. Therefore, there is a need to use feature selection techniques with machine learning in order to determine which data in the DoS detection process are most important. This paper examined techniques for improving DoS anomalies detection along with power reservation in WSNs to balance them. A new clustering technique was introduced, called the CH_Rotations algorithm, to improve anomaly detection efficiency over a WSN’s lifetime. Furthermore, the use of feature selection techniques with machine learning algorithms in examining WSN node traffic and the effect of these techniques on the lifetime of WSNs was evaluated. The evaluation results showed that the Water Cycle (WC) feature selection displayed the best average performance accuracy of 2%, 5%, 3%, and 3% greater than Particle Swarm Optimization (PSO), Simulated Annealing (SA), Harmony Search (HS), and Genetic Algorithm (GA), respectively. Moreover, the WC with Decision Tree (DT) classifier showed 100% accuracy with only one feature. In addition, the CH_Rotations algorithm improved network lifetime by 30% compared to the standard LEACH protocol. Network lifetime using the WC + DT technique was reduced by 5% compared to other WC + DT-free scenarios.

Download Full-text