A Hybrid Approach for the Analysis of Feature Selection using Information Gain and BAT Techniques on The Anomaly Detection

Every day, millions of people in many institutions communicate with each other on the Internet. The past two decades have witnessed unprecedented levels of Internet use by people around the world. Almost alongside these rapid developments in the internet space, an ever increasing incidence of attacks carried out on the internet has been consistently reported every minute. In such a difficult environment, Anomaly Detection Systems (ADS) play an important role in monitoring and analyzing daily internet activities for security breaches and threats. However, the analytical data routinely generated from computer networks are usually of enormous size and of little use. This creates a major challenge for ADSs, who must examine all the functionality of a certain dataset to identify intrusive patterns. The selection of features is an important factor in modeling anomaly-based intrusion detection systems. An irrelevant characteristic can lead to overfitting which in turn negatively affects the modeling power of classification algorithms. The objective of this study is to analyze and select the most discriminating input characteristics for the construction of efficient and computationally efficient schemes for an ADS. In the first step, a heuristic algorithm called IG-BA is proposed for dimensionality reduction by selecting the optimal subset based on the concept of entropy. Then, the relevant and meaningful features are selected, before implementing Number of Classifiers which includes: (1) An irrelevant feature can lead to overfitting which in turn negatively affects the modeling power of the classification algorithms. Experiment was done on CICIDS-2017 dataset by applying (1) Random Forest (RF), (2) Bayes Network (BN), (3) Naive Bayes (NB), (4) J48 and (5) Random Tree (RT) with results showing better detection precision and faster execution time. The proposed heuristic algorithm outperforms the existing ones as it is more accurate in detection as well as faster. However, Random Forest algorithm emerges as the best classifier for feature selection technique and scores over others by virtue of its accuracy in optimal selection of features.

Download Full-text

A Heuristic Algorithm for Feature Selection Based on Optimization Techniques

Heuristic and Optimization for Knowledge Discovery ◽

10.4018/978-1-930708-26-6.ch002 ◽

2011 ◽

pp. 13-26 ◽

Cited By ~ 5

Author(s):

A. M. Bagirov ◽

A. M. Rubinov ◽

J. Yearwood

Keyword(s):

Feature Selection ◽

Heuristic Algorithm ◽

Real World ◽

Numerical Experiments ◽

Optimization Techniques ◽

Selection Problem ◽

Feature Selection Problem ◽

Selection Of

The feature selection problem involves the selection of a subset of features that will be sufficient for the determination of structures or clusters in a given dataset and in making predictions. This chapter presents an algorithm for feature selection, which is based on the methods of optimization. To verify the effectiveness of the proposed algorithm we applied it to a number of publicly available real-world databases. The results of numerical experiments are presented and discussed. These results demonstrate that the algorithm performs well on the datasets considered.

Download Full-text

Random Forest Based Feature Selection of Macroeconomic Variables for Stock Market Prediction

American Journal of Applied Sciences ◽

10.3844/ajassp.2019.200.212 ◽

2019 ◽

Vol 16 (7) ◽

pp. 200-212 ◽

Cited By ~ 2

Author(s):

Isaac Kofi Nti ◽

Adebayo Felix Adekoya ◽

Benjamin Asubam Weyori

Keyword(s):

Feature Selection ◽

Random Forest ◽

Stock Market ◽

Stock Market Prediction ◽

Macroeconomic Variables ◽

Selection Of

Download Full-text

Fusion of Feature Selection and Random Forest for an Anomaly-Based Intrusion Detection System

Journal of Computational and Theoretical Nanoscience ◽

10.1166/jctn.2019.8332 ◽

2019 ◽

Vol 16 (8) ◽

pp. 3603-3607 ◽

Cited By ~ 1

Author(s):

Shraddha Khonde ◽

V. Ulagamuthalvi

Keyword(s):

Feature Selection ◽

Random Forest ◽

Intrusion Detection ◽

Real Time ◽

Intrusion Detection System ◽

New Technologies ◽

Detection System ◽

Sensitive Data ◽

Detection Systems ◽

New Type

Considering current network scenario hackers and intruders has become a big threat today. As new technologies are emerging fast, extensive use of these technologies and computers, what plays an important role is security. Most of the computers in network can be easily compromised with attacks. Big issue of concern is increase in new type of attack these days. Security to the sensitive data is very big threat to deal with, it need to consider as high priority issue which should be addressed immediately. Highly efficient Intrusion Detection Systems (IDS) are available now a days which detects various types of attacks on network. But we require the IDS which is intelligent enough to detect and analyze all type of new threats on the network. Maximum accuracy is expected by any of this intelligent intrusion detection system. An Intrusion Detection System can be hardware or software that analyze and monitors all activities of network to detect malicious activities happened inside the network. It also informs and helps administrator to deal with malicious packets, which if enters in network can harm more number of computers connected together. In our work we have implemented an intellectual IDS which helps administrator to analyze real time network traffic. IDS does it by classifying packets entering into the system as normal or malicious. This paper mainly focus on techniques used for feature selection to reduce number of features from KDD-99 dataset. This paper also explains algorithm used for classification i.e., Random Forest which works with forest of trees to classify real time packet as normal or malicious. Random forest makes use of ensembling techniques to give final output which is derived by combining output from number of trees used to create forest. Dataset which is used while performing experiments is KDD-99. This dataset is used to train all trees to get more accuracy with help of random forest. From results achieved we can observe that random forest algorithm gives more accuracy in distributed network with reduced false alarm rate.

Download Full-text

Multivariable Heuristic Approach to Intrusion Detection in Network Environments

Entropy ◽

10.3390/e23060776 ◽

2021 ◽

Vol 23 (6) ◽

pp. 776

Author(s):

Marcin Niemiec ◽

Rafał Kościej ◽

Bartłomiej Gdowski

Keyword(s):

Intrusion Detection ◽

Heuristic Algorithm ◽

Detection Algorithm ◽

Intrusion Detection Systems ◽

The Internet ◽

Detection Thresholds ◽

Detection Systems ◽

Different Types ◽

Ongoing Development ◽

Network Environments

The Internet is an inseparable part of our contemporary lives. This means that protection against threats and attacks is crucial for major companies and for individual users. There is a demand for the ongoing development of methods for ensuring security in cyberspace. A crucial cybersecurity solution is intrusion detection systems, which detect attacks in network environments and responds appropriately. This article presents a new multivariable heuristic intrusion detection algorithm based on different types of flags and values of entropy. The data is shared by organisations to help increase the effectiveness of intrusion detection. The authors also propose default values for parameters of a heuristic algorithm and values regarding detection thresholds. This solution has been implemented in a well-known, open-source system and verified with a series of tests. Additionally, the authors investigated how updating the variables affects the intrusion detection process. The results confirmed the effectiveness of the proposed approach and heuristic algorithm.

Download Full-text

An Intruder Detection System based on Feature Selection using Random Forest Algorithm

International Journal of Engineering and Advanced Technology - Regular Issue ◽

10.35940/ijeat.b5154.129219 ◽

2019 ◽

Vol 9 (2) ◽

pp. 5525-5529

Keyword(s):

Feature Selection ◽

Random Forest ◽

Digital Literacy ◽

Intrusion Detection System ◽

Detection System ◽

Credit Cards ◽

Principal Component ◽

Training Data ◽

The Internet ◽

Internet Applications

In every part of the world, there is tremendous growth in digital literacy in the present era. People are trying to access internet-based applications with the use of digital machines. As a result, the internet has become a primary requirement for everyone, and most business transactions often take place conveniently across the network. On the other hand, intruders involved in making intrusions and doing activities such as capturing passwords, compromise on the route, collecting details of credit cards, etc. Many malicious activities are taking place over the network due to this intruding activity on the internet. Applications such as host-based Intrusion Detection System (IDS) and network-based IDS have previously been used to control network intruders. Mostly when they come with Encrypted packets, spoofed network ids, these techniques were not able to control intruders promisingly. It is essential to examine these types of attacks periodically to identify patterns of recent attacks. In this paper, the authors have proposed a model based on deep learning by using the NSL – KDD dataset to solve these problems. For later train, the model with data with a random forest classifier algorithm, the principal component analysis applied for feature selection. The model is designed to detect patterns of intruders effectively using the knowledge gained from training data. To detect malicious patterns over the network, the model shows a sufficient accuracy of around 90 percent.

Download Full-text

Analisis Sentimen Terhadap Review Film Menggunakan Metode Modified Balanced Random Forest dan Mutual Information

JURNAL MEDIA INFORMATIKA BUDIDARMA ◽

10.30865/mib.v5i2.2844 ◽

2021 ◽

Vol 5 (2) ◽

pp. 415

Author(s):

Firdausi Nuzula Zamzami ◽

Adiwijaya Adiwijaya ◽

Mahendra Dwifebri P

Keyword(s):

Machine Learning ◽

Feature Selection ◽

Random Forest ◽

Mutual Information ◽

Sentiment Analysis ◽

Information Exchange ◽

The Internet ◽

Machine Learning Method ◽

Learning Method ◽

Internet Information

Information exchange is currently the most happening on the internet. Information exchange can be done in many ways, such as expressing expressions on social media. One of them is reviewing a film. When someone reviews a film he will use his emotions to express their feelings, it can be positive or negative. The fast growth of the internet has made information more diverse, plentiful and unstructured. Sentiment analysis can handle this, because sentiment analysis is a classification process to understand opinions, interactions, and emotions of a document or text that is carried out automatically by a computer system. One suitable machine learning method is the Modified Balanced Random Forest. To deal with the various data, the feature selection used is Mutual Information. With these two methods, the system is able to produce an accuracy value of 79% and F1-scores value of 75%.

Download Full-text

A Risk Prediction Model for Type 2 Diabetes Based on Weighted Feature Selection of Random Forest and XGBoost Ensemble Classifier

2019 Eleventh International Conference on Advanced Computational Intelligence (ICACI) ◽

10.1109/icaci.2019.8778622 ◽

2019 ◽

Cited By ~ 3

Author(s):

Zhongxian Xu ◽

Zhiliang Wang

Keyword(s):

Type 2 Diabetes ◽

Feature Selection ◽

Random Forest ◽

Prediction Model ◽

Risk Prediction ◽

Ensemble Classifier ◽

Risk Prediction Model ◽

Selection Of

Download Full-text

Feature selection of Complex Power Quality Disturbances and Parameter Optimization of Random Forest

2019 4th International Conference on Intelligent Green Building and Smart Grid (IGBSG) ◽

10.1109/igbsg.2019.8886280 ◽

2019 ◽

Author(s):

Renming Wang ◽

Hongyang Wang ◽

Lingyun Wang

Keyword(s):

Feature Selection ◽

Random Forest ◽

Power Quality ◽

Parameter Optimization ◽

Complex Power ◽

Power Quality Disturbances ◽

Selection Of

Download Full-text

Analysis of NSL KDD Dataset Using Classification Algorithms for Intrusion Detection System

Recent Patents on Engineering ◽

10.2174/1872212112666180402122150 ◽

2019 ◽

Vol 13 (2) ◽

pp. 142-147

Author(s):

Srishti Sharma ◽

Yogita Gigras ◽

Rita Chhikara ◽

Anuradha Dhull

Keyword(s):

Random Forest ◽

Intrusion Detection ◽

Detection System ◽

Random Trees ◽

Attribute Selection ◽

Classification Algorithms ◽

Random Forest Classification ◽

Detection Systems ◽

Forest Classification ◽

Feature Attribute

Background: Intrusion detection systems are responsible for detecting anomalies and network attacks. Building of an effective IDS depends upon the readily available dataset. This dataset is used to train and test intelligent IDS. In this research, NSL KDD dataset (an improvement over original KDD Cup 1999 dataset) is used as KDD’99 contains huge amount of redundant records, which makes it difficult to process the data accurately. Methods: The classification techniques applied on this dataset to analyze the data are decision trees like J48, Random Forest and Random Trees. Results: On comparison of these three classification algorithms, Random Forest was proved to produce the best results and therefore, Random Forest classification method was used to further analyze the data. The results are analyzed and depicted in this paper with the help of feature/attribute selection by applying all the possible combinations. Conclusion: There are total of eight significant attributes selected after applying various attribute selection methods on NSL KDD dataset.

Download Full-text