scholarly journals Intrusion Detection System using SMIFS and Multi class Multi layer Perceptron

As the new technologies are emerging, data is getting generated in larger volumes high dimensions. The high dimensionality of data may rise to great challenge while classification. The presence of redundant features and noisy data degrades the performance of the model. So, it is necessary to extract the relevant features from given data set. Feature extraction is an important step in many machine learning algorithms. Many researchers have been attempted to extract the features. Among these different feature extraction methods, mutual information is widely used feature selection method because of its good quality of quantifying dependency among the features in classification problems. To cope with this issue, in this paper we proposed simplified mutual information based feature selection with less computational overhead. The selected feature subset is experimented with multilayered perceptron on KDD CUP 99 data set with 2- class classification, 5-class classification and 4-class classification. The accuracy is of these models almost similar with less number of features.

2021 ◽  
Vol 1 (1) ◽  
pp. 66-83
Author(s):  
Rawaa Ismael Farhan ◽  
Abeer Tariq Maolood ◽  
NidaaFlaih Hassan

Network Intrusion Detection System (NIDS) detects normal and malicious behavior by analyzing network traffic, this analysis has the potential to detect novel attacks especially in IoT environments. Deep Learning (DL)has proven its outperformance compared to machine learning algorithms in solving the complex problems of the real-world like NIDS. Although, this approach needs more computational resources and consumes a long time. Feature selection plays a significant role in choosing the best features only that describe the target concept optimally during a classification process. However, when handling a large number of features the selecting such relevant features becomes a difficult task. Therefore, this paper proposes Enhanced BPSO using Binary Particle Swarm Optimization (BPSO) and correlation–based (CFS) classical statistical feature selection approach to solve the problem on BPSO feature selection. The selected feature subset has evaluated on Deep Neural Networks (DNN) classifiers and the new flow-based CSE-CIC-IDS2018 dataset. Experimental results have shown a high accuracy of 95% based on processing time, detection rate, and false alarm rate compared with other benchmark classifiers.


2016 ◽  
Vol 6 (1) ◽  
pp. 11-24
Author(s):  
Muhammad A. Sulaiman ◽  
Jane Labadin

Mutual Information (MI) is an information theory concept often used in the recent time as a criterion for feature selection methods. This is due to its ability to capture both linear and non-linear dependency relationships between two variables. In theory, mutual information is formulated based on probability density functions (pdfs) or entropies of the two variables. In most machine learning applications, mutual information estimation is formulated for classification problems (that is data with labeled output). This study investigates the use of mutual information estimation as a feature selection criterion for regression tasks and introduces enhancement in selecting optimal feature subset based on previous works. Specifically, while focusing on regression tasks, it builds on the previous work in which a scientifically sound stopping criteria for feature selection greedy algorithms was proposed. Four real-world regression datasets were used in this study, three of the datasets are public obtained from UCI machine learning repository and the remaining one is a private well log dataset. Two Machine learning models namely multiple regression and artificial neural networks (ANN) were used to test the performance of IFSMIR. The results obtained has proved the effectiveness of the proposed method.


2021 ◽  
Vol 2021 ◽  
pp. 1-10
Author(s):  
Houda Amazal ◽  
Mohamed Kissi

Feature selection (FS) is a fundamental task for text classification problems. Text feature selection aims to represent documents using the most relevant features. This process can reduce the size of datasets and improve the performance of the machine learning algorithms. Many researchers have focused on elaborating efficient FS techniques. However, most of the proposed approaches are evaluated for small datasets and validated using single machines. As textual data dimensionality becomes higher, traditional FS methods must be improved and parallelized to handle textual big data. This paper proposes a distributed approach for feature selection based on mutual information (MI) method, which is widely applied in pattern recognition and machine learning. A drawback of MI is that it ignores the frequency of the terms during the selection of features. The proposal introduces a distributed FS method, namely, Maximum Term Frequency-Mutual Information (MTF-MI), based on term frequency and mutual information techniques to improve the quality of the selected features. The proposed approach is implemented on Hadoop using the MapReduce programming model. The effectiveness of MTF-MI is demonstrated through several text classification experiments using the multinomial Naïve Bayes classifier on three datasets. Through a series of tests, the results reveal that the proposed MTF-MI method improves the classification results compared with four state-of-the-art methods in terms of macro-F1 and micro-F1 measures.


2014 ◽  
Vol 507 ◽  
pp. 806-809
Author(s):  
Shu Fang Li ◽  
Qin Jia ◽  
Hong Liang

In order to Red Tide algae present real-time automatic classification method of high accuracy rate, this paper proposes using ReliefF-SBS for feature selection. Namely feature analysis about Red Tide algae image original data set. And on this basis, feature selection to remove the irrelevant features and redundant features from the original feature set feature, to get the optimal feature subset, and reduce their impact on the classification accuracy. Meanwhile compare the classification results before and after SVM and KNN two kinds feature selection classifiers.


Author(s):  
Ilangovan Sangaiya ◽  
A. Vincent Antony Kumar

In data mining, people require feature selection to select relevant features and to remove unimportant irrelevant features from a original data set based on some evolution criteria. Filter and wrapper are the two methods used but here the authors have proposed a hybrid feature selection method to take advantage of both methods. The proposed method uses symmetrical uncertainty and genetic algorithms for selecting the optimal feature subset. This has been done so as to improve processing time by reducing the dimension of the data set without compromising the classification accuracy. This proposed hybrid algorithm is much faster and scales well to the data set in terms of selected features, classification accuracy and running time than most existing algorithms.


2021 ◽  
Vol 9 (1) ◽  
pp. 595-603
Author(s):  
Shivangi Srivastav, Rajiv Ranjan Tewari

Speech is a significant quality for distinguishing a person in daily human to human interaction/ communication. Like other biometric measures, such as face, iris and fingerprints, voice can therefore be used as a biometric measure for perceiving or identifying the person. Speaker recognition is almost the same as a kind of voice recognition in which the speaker is identified from the expression instead of the message. Automatic Speaker Recognition (ASR) is the way to identify people who rely on highlights that are omitted from speech expressions. Speech signals are awesome correspondence media that constantly pass on rich and useful knowledge, such as a speaker's feeling, sexual orientation, complement, and other interesting attributes. In any speaker identification, the essential task is to delete helpful highlights and allow for significant examples of speaker models. Hypothetical description, organization of the full state of feeling and the modalities of articulation of feeling are added. A SER framework is developed to conduct this investigation, in view of different classifiers and different techniques for extracting highlights. In this work various machine learning algorithms are investigated to identify decision boundary in feature space of audio signals. Moreover novelty of this art lies in improving the performance of classical machine learning algorithms using information theory based feature selection methods. The higher accuracy retrieved is 96 percent using Random forest algorithm incorporated with Joint Mutual information feature selection method.


2020 ◽  
Vol 4 (1) ◽  
pp. 29
Author(s):  
Sasan Sarbast Abdulkhaliq ◽  
Aso Mohammad Darwesh

Nowadays, people from every part of the world use social media and social networks to express their feelings toward different topics and aspects. One of the trendiest social media is Twitter, which is a microblogging website that provides a platform for its users to share their views and feelings about products, services, events, etc., in public. Which makes Twitter one of the most valuable sources for collecting and analyzing data by researchers and developers to reveal people sentiment about different topics and services, such as products of commercial companies, services, well-known people such as politicians and athletes, through classifying those sentiments into positive and negative. Classification of people sentiment could be automated through using machine learning algorithms and could be enhanced through using appropriate feature selection methods. We collected most recent tweets about (Amazon, Trump, Chelsea FC, CR7) using Twitter-Application Programming Interface and assigned sentiment score using lexicon rule-based approach, then proposed a machine learning model to improve classification accuracy through using hybrid feature selection method, namely, filter-based feature selection method Chi-square (Chi-2) plus wrapper-based binary coordinate ascent (Chi-2 + BCA) to select optimal subset of features from term frequency-inverse document frequency (TF-IDF) generated features for classification through support vector machine (SVM), and Bag of words generated features for logistic regression (LR) classifiers using different n-gram ranges. After comparing the hybrid (Chi-2+BCA) method with (Chi-2) selected features, and also with the classifiers without feature subset selection, results show that the hybrid feature selection method increases classification accuracy in all cases. The maximum attained accuracy with LR is 86.55% using (1 + 2 + 3-g) range, with SVM is 85.575% using the unigram range, both in the CR7 dataset.


2014 ◽  
Vol 2014 ◽  
pp. 1-11 ◽  
Author(s):  
Zilin Zeng ◽  
Hongjun Zhang ◽  
Rui Zhang ◽  
Youliang Zhang

We introduced a novel hybrid feature selection method based on rough conditional mutual information and Naive Bayesian classifier. Conditional mutual information is an important metric in feature selection, but it is hard to compute. We introduce a new measure called rough conditional mutual information which is based on rough sets; it is shown that the new measure can substitute Shannon’s conditional mutual information. Thus rough conditional mutual information can also be used to filter the irrelevant and redundant features. Subsequently, to reduce the feature and improve classification accuracy, a wrapper approach based on naive Bayesian classifier is used to search the optimal feature subset in the space of a candidate feature subset which is selected by filter model. Finally, the proposed algorithms are tested on several UCI datasets compared with other classical feature selection methods. The results show that our approach obtains not only high classification accuracy, but also the least number of selected features.


2019 ◽  
Vol 2019 ◽  
pp. 1-11 ◽  
Author(s):  
Jiadong Ren ◽  
Jiawei Guo ◽  
Wang Qian ◽  
Huang Yuan ◽  
Xiaobing Hao ◽  
...  

Intrusion detection system (IDS) can effectively identify anomaly behaviors in the network; however, it still has low detection rate and high false alarm rate especially for anomalies with fewer records. In this paper, we propose an effective IDS by using hybrid data optimization which consists of two parts: data sampling and feature selection, called DO_IDS. In data sampling, the Isolation Forest (iForest) is used to eliminate outliers, genetic algorithm (GA) to optimize the sampling ratio, and the Random Forest (RF) classifier as the evaluation criteria to obtain the optimal training dataset. In feature selection, GA and RF are used again to obtain the optimal feature subset. Finally, an intrusion detection system based on RF is built using the optimal training dataset obtained by data sampling and the features selected by feature selection. The experiment will be carried out on the UNSW-NB15 dataset. Compared with other algorithms, the model has obvious advantages in detecting rare anomaly behaviors.


Sign in / Sign up

Export Citation Format

Share Document