Intrusion Detection System Berbasis Seleksi Fitur Dengan Kombinasi Filter Information Gain Ratio Dan Correlation

Intrusion Detection System merupakan suatu sistem yang dikembangkan untuk memantau dan memfilter aktivitas jaringan dengan mengidentifikasi serangan. Karena jumlah data yang perlu diperiksa oleh IDS sangat besar dan banyaknya fitur-fitur asing yang dapat membuat proses analisis menjadi sulit untuk mendeteksi pola perilaku yang mencurigakan, maka IDS perlu mengurangi jumlah data yang akan diproses dengan cara mengurangi fitur yang dapat dilakukan dengan seleksi fitur. Pada penelitian ini mengkombinasikan dua metode perangkingan fitur yaitu Information Gain Ratio dan Correlation dan mengklasifikasikannya menggunakan algoritma K-Nearest Neighbor. Hasil perankingan dari kedua metode dibagi menjadi dua kelompok. Pada kelompok pertama dicari nilai mediannya dan untuk kelompok kedua dihapus. Lalu dilakukan klasifikasi K-Nearest Neighbor dengan menggunakan 10 kali validasi silang dan dilakukan pengujian dengan nilai k=5. Penerapan pemodelan yang diusulkan menghasilkan akurasi tertinggi sebesar 99.61%. Sedangkan untuk akurasi tanpa seleksi fitur menghasilkan akurasi tertinggi sebesar 99.59%. AbstractIntrusion Detection System is a system that was developed for monitoring and filtering activity in network with identified of attack. Because of the amount of the data that need to be checked by IDS is very large and many foreign feature that can make the analysis process difficult for detection suspicious pattern of behavior, so that IDS need for reduce amount of the data to be processed by reducing features that can be done by feature selection. In this study, combines two methods of feature ranking is Information Gain Ratio and Correlation and classify it using K-Nearest Neighbor algorithm. The result of feature ranking from the both methods divided into two groups. in the first group searched for the median value and in the second group is removed. Then do the classification of K-Nearest Neighbor using 10 fold cross validation and do the tests with values k=5. The result of the proposed modelling produce the highest accuracy of 99.61%. While the highest accuracy value of the not using the feature selection is 99.59%.

Download Full-text

Anomaly-Based Intrusion Detection: Feature Selection and Normalization Influence to the Machine Learning Models Accuracy

European Journal of Engineering and Formal Sciences ◽

10.26417/ejef.v2i3.p101-106 ◽

2018 ◽

Vol 2 (3) ◽

pp. 101

Author(s):

Danijela Protić ◽

Miomir Stanković

Keyword(s):

Feature Selection ◽

Intrusion Detection ◽

Decision Tree ◽

Network Traffic ◽

Intrusion Detection System ◽

Nearest Neighbor ◽

Reference Model ◽

Detection System ◽

Support Vector ◽

K Nearest Neighbor

Anomaly-based intrusion detection system detects intrusion to the computer network based on a reference model that has to be able to identify its normal behavior and flag what is not normal. In this process network traffic is classified into two groups by adding different labels to normal and malicious behavior. Main disadvantage of anomaly-based intrusion detection system is necessity to learn the difference between normal and not normal. Another disadvantage is the complexity of datasets which simulate realistic network traffic. Feature selection and normalization can be used to reduce data complexity and decrease processing runtime by selecting a better feature space This paper presents the results of testing the influence of feature selection and instances normalization to the classification performances of k-nearest neighbor, weighted k-nearest neighbor, support vector machines and decision tree models on 10 days records of the Kyoto 2006+ dataset. The data was pre-processed to remove all categorical features from the dataset. The resulting subset contained 17 features. Features containing instances which could not be normalized into the range [-1, 1] have also been removed. The resulting subset consisted of nine features. The feature ‘Label’ categorized network traffic to two classes: normal (1) and malicious (0). The performance metric to evaluate models was accuracy. Proposed method resulted in very high accuracy values with Decision Tree giving highest values for not-normalized and with k-nearest neighbor giving highest values for normalized data.Keywords: feature selection, normalization, k-NN, weighted k-NN, SVM, decision tree, Kyoto 2006+

Download Full-text

A novel ensemble modeling for intrusion detection system

International Journal of Electrical and Computer Engineering (IJECE) ◽

10.11591/ijece.v10i2.pp1963-1971 ◽

2020 ◽

Vol 10 (2) ◽

pp. 1963

Author(s):

Pullagura Indira Priyadarsini ◽

G. Anuradha

Keyword(s):

Feature Selection ◽

Intrusion Detection ◽

Intrusion Detection System ◽

Nearest Neighbor ◽

Detection System ◽

Distance Functions ◽

Classification Model ◽

Support Vector ◽

K Nearest Neighbor ◽

Data Set

Vast increase in data through internet services has made computer systems more vulnerable and difficult to protect from malicious attacks. Intrusion detection systems (IDSs) must be more potent in monitoring intrusions. Therefore an effectual Intrusion Detection system architecture is built which employs a facile classification model and generates low false alarm rates and high accuracy. Noticeably, IDS endure enormous amounts of data traffic that contain redundant and irrelevant features, which affect the performance of the IDS negatively. Despite good feature selection approaches leads to a reduction of unrelated and redundant features and attain better classification accuracy in IDS. This paper proposes a novel ensemble model for IDS based on two algorithms Fuzzy Ensemble Feature selection (FEFS) and Fusion of Multiple Classifier (FMC). FEFS is a unification of five feature scores. These scores are obtained by using feature-class distance functions. Aggregation is done using fuzzy union operation. On the other hand, the FMC is the fusion of three classifiers. It works based on Ensemble decisive function. Experiments were made on KDD cup 99 data set have shown that our proposed system works superior to well-known methods such as Support Vector Machines (SVMs), K-Nearest Neighbor (KNN) and Artificial Neural Networks (ANNs). Our examinations ensured clearly the prominence of using ensemble methodology for modeling IDSs. And hence our system is robust and efficient.

Download Full-text

A feature selection algorithm combining information gain and multi-objective genetic search for intrusion detection system

MATEC Web of Conferences ◽

10.1051/matecconf/202133608008 ◽

2021 ◽

Vol 336 ◽

pp. 08008

Author(s):

Tao Xie

Keyword(s):

Feature Selection ◽

Intrusion Detection ◽

Intrusion Detection System ◽

Detection Rate ◽

Information Gain ◽

Detection System ◽

Selection Algorithm ◽

Feature Selection Algorithm ◽

Genetic Search ◽

Multi Objective

In order to improve the detection rate and speed of intrusion detection system, this paper proposes a feature selection algorithm. The algorithm uses information gain to rank the features in descending order, and then uses a multi-objective genetic algorithm to gradually search the ranking features to find the optimal feature combination. We classified the Kddcup98 dataset into five classes, DOS, PROBE, R2L, and U2R, and conducted numerous experiments on each class. Experimental results show that for each class of attack, the proposed algorithm can not only speed up the feature selection, but also significantly improve the detection rate of the algorithm.

Download Full-text

Scrutinizing Attacks and Evaluating Performance Appraisal Parameters via Feature Selection in Intrusion Detection System

10.21203/rs.3.rs-748765/v1 ◽

2021 ◽

Author(s):

Navroop Kaur ◽

Meenakshi Bansal ◽

Sukhwinder Singh S

Keyword(s):

Feature Selection ◽

Performance Evaluation ◽

Intrusion Detection ◽

Intrusion Detection System ◽

Detection System ◽

Denial Of Service ◽

Cyber Attacks ◽

Support Vector ◽

K Nearest Neighbor ◽

Evaluation Parameters

Abstract In modern times the firewall and antivirus packages are not good enough to protect the organization from numerous cyber attacks. Computer IDS (Intrusion Detection System) is a crucial aspect that contributes to the success of an organization. IDS is a software application responsible for scanning organization networks for suspicious activities and policy rupturing. IDS ensures the secure and reliable functioning of the network within an organization. IDS underwent huge transformations since its origin to cope up with the advancing computer crimes. The primary motive of IDS has been to augment the competence of detecting the attacks without endangering the performance of the network. The research paper elaborates on different types and different functions performed by the IDS. The NSL KDD dataset has been considered for training and testing. The seven prominent classifiers LR (Logistic Regression), NB (Naïve Bayes), DT (Decision Tree), AB (AdaBoost), RF (Random Forest), kNN (k Nearest Neighbor), and SVM (Support Vector Machine) have been studied along with their pros and cons and the feature selection have been imposed to enhance the reading of performance evaluation parameters (Accuracy, Precision, Recall, and F1Score). The paper elaborates a detailed flowchart and algorithm depicting the procedure to perform feature selection using XGB (Extreme Gradient Booster) for four categories of attacks: DoS (Denial of Service), Probe, R2L (Remote to Local Attack), and U2R (User to Root Attack). The selected features have been ranked as per their occurrence. The implementation have been conducted at five different ratios of 60-40%, 70-30%, 90-10%, 50-50%, and 80-20%. Different classifiers scored best for different performance evaluation parameters at different ratios. NB scored with the best Accuracy and Recall values. DT and RF consistently performed with high accuracy. NB, SVM, and kNN achieved good F1Score.

Download Full-text

The Application of Multi-Class Support Vector Machines on Intrusion Detection System with the Feature Selection using Information Gain

Proceedings of the 1st Annual International Conference on Mathematics, Science, and Education (ICoMSE 2017) ◽

10.2991/icomse-17.2018.1 ◽

2018 ◽

Author(s):

Jihan Maharani ◽

Zuherman Rustam

Keyword(s):

Feature Selection ◽

Support Vector Machines ◽

Intrusion Detection ◽

Intrusion Detection System ◽

Information Gain ◽

Detection System ◽

Support Vector ◽

Vector Machines

Download Full-text

METODE TRANSFORMASI CAKAR AYAM UNTUK MEREDUKSI SEARCH-SPACE PADA INTRUSION DETECTION SYSTEM BERBASIS K-NEAREST NEIGHBOR

Jurnal Ilmiah Teknologi Infomasi Terapan ◽

10.33197/jitter.vol2.iss2.2016.97 ◽

2016 ◽

Vol 2 (2) ◽

Author(s):

Kharisma Muchammad ◽

Thomas Brian

Keyword(s):

Intrusion Detection ◽

Intrusion Detection System ◽

Nearest Neighbor ◽

Detection System ◽

Search Space ◽

Group Method ◽

K Nearest Neighbor ◽

Pair Group ◽

Space Reduction ◽

Unweighted Pair Group Method

[Id] Penggunaan Intrusion Detection System (IDS) pada jaringan komputer merupakan hal yang diperlukan untuk menjaga keamanan jaringan. Beberapa IDS berbasis K-nearest neighbor (KNN) memiliki akurasi yang relatif baik namun jika data training terlalu besar, waktu yang diperlukan untuk mendeteksi serangan juga meningkat. Waktu untuk deteksi bisa ditekan dengan mereduksi search space pada data training. Namun problem reduksi search space dengan mempertahankan kualitas deteksi masih merupakan problem terbuka. Pada artikel ini diajukan suatu metode transformasi "cakar ayam" berbasis jumlah jarak data ke centroid dan jarak data ke dua sub-centroid untuk mereduksi search space pada IDS berbasis K-nearest neighbor. Localized K-nearest neighbor dilakukan pada data yang telah tertransformasi. Eksperimen menggunakan agglome-rative hierarchial clustering dengan Unweighted Pair-Group Method of Centroid pada dataset NSL-KDD 20% menunjukkan penurunan search space maksimum sebesar 38% dengan tingkat akurasi sebesar 77.5%. Tingkat akurasi dan specificity maksimum yang dicapai pada eksperimen sebesar 88% dan 88.3% dengan tingkat reduksi sebesar 12% dan tingkat sensitifity maksimum yang dicapai sebesar 80.2% pada tingkat reduksi 11%. Berdasarkan eksperimen, luas search space dapat dikurangi sambil menjaga akurasi deteksi. Rasio tradeoff antara akurasi dan search space mungkin dapat diperbaiki dengan mengganti algortima clustering dengan divisive hierarchial clustring. Abstract : clustering, deteksi intrusi, keamanan jaringan [En] Intrusion detection System (IDS) for computer network has became an essential needs to ensure network security. Some K-nearest neighbor (KNN) based IDS have a relatively good accracy in detecting attack, but the need to use all training data costs time consumption . Detection time cost can be reduced by reducing search space needed for the algorithm. The problem of search space reduction while maintaining decent accuracy still an open problem. In this Paper we propose a new transformation method "chiken claw" method. which based on sum of two distances. The first distance is the distance of data and its cluster. The later is distance of data to 2 of its cluster's sub-centroid..This method is proposed to reduce the search space on K-nearest neighbor based IDS because the search is based on resulted one dimentional transformed data. Experiment using Unweighted Pair-group Method of centroid on NSL-KDD 20% shows maximum search space reduction 38% with 75% accuracy. Maximum accuracy and sensitivity in the experiment is 88% and 88.3% respectively with space reduction 12%. Maximum sensitivity from experiment is at 80.2% with 11% space reduction. Based on experiments, search space can be reduced while maintaining accuracy. Search space-accuracy trade off might be improved by using different clustering algorithm such as divisive hierarchial clustering

Download Full-text

Intrusion Detection System for IP Multimedia Subsystem using K-Nearest Neighbor classifier

2008 IEEE International Multitopic Conference ◽

10.1109/inmic.2008.4777775 ◽

2008 ◽

Cited By ~ 2

Author(s):

Ashfaq Hussain Farooqi ◽

Ali Munir

Keyword(s):

Intrusion Detection ◽

Intrusion Detection System ◽

Nearest Neighbor ◽

Detection System ◽

Ip Multimedia Subsystem ◽

K Nearest Neighbor ◽

Nearest Neighbor Classifier ◽

Neighbor Classifier

Download Full-text

Influence of pre-processing on anomaly-based intrusion detection

Vojnotehnicki glasnik ◽

10.5937/vojtehg68-27319 ◽

2020 ◽

Vol 68 (3) ◽

pp. 598-611 ◽

Cited By ~ 1

Author(s):

Danijela Protić

Keyword(s):

Feature Selection ◽

Intrusion Detection ◽

Nearest Neighbor ◽

Computer Network ◽

Reference Model ◽

Detection System ◽

False Negative ◽

K Nearest Neighbor ◽

Dataset Size ◽

Binary Classifiers

Introduction/purpose: The anomaly-based intrusion detection system detects intrusions based on a reference model which identifies the normal behavior of a computer network and flags an anomaly. Machine-learning models classify intrusions or misuse as either normal or anomaly. In complex computer networks, the number of training records is large, which makes the evaluation of the classifiers computationally expensive. Methods: A feature selection algorithm that reduces the dataset size is presented in this paper. Results: The experiments are conducted on the Kyoto 2006+ dataset and four classifier models: feedforward neural network, k-nearest neighbor, weighted k-nearest neighbor, and medium decision tree. The results show high accuracy of the models, as well as low false positive and false negative rates. Conclusion: The three-step pre-processing algorithm for feature selection and instance normalization resulted in improving performances of four binary classifiers and in decreasing processing time.

Download Full-text

K-Nearest Neighbor and Boundary Cutting Algorithm for Intrusion Detection System

Advances in Intelligent Systems and Computing - Information Systems Design and Intelligent Applications ◽

10.1007/978-81-322-2752-6_26 ◽

2016 ◽

pp. 269-278 ◽

Cited By ~ 1

Author(s):

Punam Mulak ◽

D. P. Gaikwad ◽

N. R. Talhar

Keyword(s):

Intrusion Detection ◽

Intrusion Detection System ◽

Nearest Neighbor ◽

Detection System ◽

K Nearest Neighbor ◽

Cutting Algorithm

Download Full-text

Comparative Analysis of Intrusion Detection Attack Based on Machine Learning Classifiers

Indian Journal of Artificial Intelligence and Neural Networking ◽

10.35940/ijainn.b1025.041221 ◽

2021 ◽

Vol 1 (2) ◽

pp. 22-28

Author(s):

Surafel Mehari Atnafu ◽

Anuja Kumar Acharya

Keyword(s):

Machine Learning ◽

Intrusion Detection ◽

Intrusion Detection System ◽

Nearest Neighbor ◽

Detection System ◽

Machine Learning Techniques ◽

K Nearest Neighbor ◽

Machine Learning Classifiers ◽

Learning Classifiers ◽

Learning Techniques

In current day information transmitted from one place to another by using network communication technology. Due to such transmission of information, networking system required a high security environment. The main strategy to secure this environment is to correctly identify the packet and detect if the packet contains a malicious and any illegal activity happened in network environments. To accomplish this, we use intrusion detection system (IDS). Intrusion detection is a security technology that design detects and automatically alert or notify to a responsible person. However, creating an efficient Intrusion Detection System face a number of challenges. These challenges are false detection and the data contain high number of features. Currently many researchers use machine learning techniques to overcome the limitation of intrusion detection and increase the efficiency of intrusion detection for correctly identify the packet either the packet is normal or malicious. Many machine-learning techniques use in intrusion detection. However, the question is which machine learning classifiers has been potentially to address intrusion detection issue in network security environment. Choosing the appropriate machine learning techniques required to improve the accuracy of intrusion detection system. In this work, three machine learning classifiers are analyzed. Support vector Machine, Naïve Bayes Classifier and K-Nearest Neighbor classifiers. These algorithms tested using NSL KDD dataset by using the combination of Chi square and Extra Tree feature selection method and Python used to implement, analyze and evaluate the classifiers. Experimental result show that K-Nearest Neighbor classifiers outperform the method in categorizing the packet either is normal or malicious.

Download Full-text