scholarly journals Binary Bat Algorithm for text feature selection in news events detection model using Markov clustering

2021 ◽  
Vol 9 (1) ◽  
Author(s):  
Wafa Zubair Al-Dyani ◽  
Farzana Kabir Ahmad ◽  
Siti Sakira Kamaruddin
2021 ◽  
Vol 8 (1) ◽  
Author(s):  
Joffrey L. Leevy ◽  
John Hancock ◽  
Richard Zuech ◽  
Taghi M. Khoshgoftaar

AbstractMachine learning algorithms efficiently trained on intrusion detection datasets can detect network traffic capable of jeopardizing an information system. In this study, we use the CSE-CIC-IDS2018 dataset to investigate ensemble feature selection on the performance of seven classifiers. CSE-CIC-IDS2018 is big data (about 16,000,000 instances), publicly available, modern, and covers a wide range of realistic attack types. Our contribution is centered around answers to three research questions. The first question is, “Does feature selection impact performance of classifiers in terms of Area Under the Receiver Operating Characteristic Curve (AUC) and F1-score?” The second question is, “Does including the Destination_Port categorical feature significantly impact performance of LightGBM and Catboost in terms of AUC and F1-score?” The third question is, “Does the choice of classifier: Decision Tree (DT), Random Forest (RF), Naive Bayes (NB), Logistic Regression (LR), Catboost, LightGBM, or XGBoost, significantly impact performance in terms of AUC and F1-score?” These research questions are all answered in the affirmative and provide valuable, practical information for the development of an efficient intrusion detection model. To the best of our knowledge, we are the first to use an ensemble feature selection technique with the CSE-CIC-IDS2018 dataset.


2012 ◽  
Vol 532-533 ◽  
pp. 1191-1195 ◽  
Author(s):  
Zhen Yan Liu ◽  
Wei Ping Wang ◽  
Yong Wang

This paper introduces the design of a text categorization system based on Support Vector Machine (SVM). It analyzes the high dimensional characteristic of text data, the reason why SVM is suitable for text categorization. According to system data flow this system is constructed. This system consists of three subsystems which are text representation, classifier training and text classification. The core of this system is the classifier training, but text representation directly influences the currency of classifier and the performance of the system. Text feature vector space can be built by different kinds of feature selection and feature extraction methods. No research can indicate which one is the best method, so many feature selection and feature extraction methods are all developed in this system. For a specific classification task every feature selection method and every feature extraction method will be tested, and then a set of the best methods will be adopted.


2017 ◽  
Vol E100.D (8) ◽  
pp. 1860-1869 ◽  
Author(s):  
Bin YANG ◽  
Yuliang LU ◽  
Kailong ZHU ◽  
Guozheng YANG ◽  
Jingwei LIU ◽  
...  

2017 ◽  
Vol 9 (1) ◽  
pp. 168781401668529 ◽  
Author(s):  
Sheng-wei Fei

In this article, fault diagnosis of bearing based on relevance vector machine classifier with improved binary bat algorithm is proposed, and the improved binary bat algorithm is used to select the appropriate features and kernel parameter of relevance vector machine. In the improved binary bat algorithm, the new velocities updating method of the bats is presented in order to ensure the decreasing of the probabilities of changing their position vectors’ elements when the position vectors’ elements of the bats are equal to the current best location’s element, and the increasing of the probabilities of changing their position vectors’ elements when the position vectors’ elements of the bats are unequal to the current best location’s element, which are helpful to strengthen the optimization ability of binary bat algorithm. The traditional relevance vector machine trained by the training samples with the unreduced features can be used to compare with the proposed improved binary bat algorithm–relevance vector machine method. The experimental results indicate that improved binary bat algorithm–relevance vector machine has a stronger fault diagnosis ability of bearing than the traditional relevance vector machine trained by the training samples with the unreduced features, and fault diagnosis of bearing based on improved binary bat algorithm–relevance vector machine is feasible.


Entropy ◽  
2019 ◽  
Vol 21 (6) ◽  
pp. 602 ◽  
Author(s):  
Jaesung Lee ◽  
Jaegyun Park ◽  
Hae-Cheon Kim ◽  
Dae-Won Kim

Multi-label feature selection is an important task for text categorization. This is because it enables learning algorithms to focus on essential features that foreshadow relevant categories, thereby improving the accuracy of text categorization. Recent studies have considered the hybridization of evolutionary feature wrappers and filters to enhance the evolutionary search process. However, the relative effectiveness of feature subset searches of evolutionary and feature filter operators has not been considered. This results in degenerated final feature subsets. In this paper, we propose a novel hybridization approach based on competition between the operators. This enables the proposed algorithm to apply each operator selectively and modify the feature subset according to its relative effectiveness, unlike conventional methods. The experimental results on 16 text datasets verify that the proposed method is superior to conventional methods.


Sign in / Sign up

Export Citation Format

Share Document