Northern Bald Ibis Algorithm-Based Novel Feature Selection Approach

Author(s):  
Ravi Kumar Saidala

Emails have become one of the popular and flexible web or mobile-based applications that enables users to communicate. For decades, the most severe problem identified in email applications was unwanted emails. Electronic spam is also referred as spam emails, in which unsolicited and unwanted mails are Received. Making an email mailbox clean by detecting and eliminating all the spam mails is a challenging task. Classification-based email filtering is one of the best approaches used by many researchers to deal with the spam email filtering problem. In this work, the NOA optimization algorithm and the SVM classifier are used for getting an optimal feature subset of the Enron-spam dataset and classifying the obtained optimal feature subset. NOA is a recently developed metaheuristic algorithm which is driven by mimicking the energy saving flying pattern of the Northern Bald Ibis (Threskiornithidae). The performance comparisons have been made with other existing methods. The superiority of the proposed novel feature selection approach is evident in the analysis and comparison of the classification results.

2013 ◽  
Vol 380-384 ◽  
pp. 1593-1599
Author(s):  
Hao Yan Guo ◽  
Da Zheng Wang

The traditional motivation behind feature selection algorithms is to find the best subset of features for a task using one particular learning algorithm. However, it has been often found that no single classifier is entirely satisfactory for a particular task. Therefore, how to further improve the performance of these single systems on the basis of the previous optimal feature subset is a very important issue.We investigate the notion of optimal feature selection and present a practical feature selection approach that is based on an optimal feature subset of a single CAD system, which is referred to as a multilevel optimal feature selection method (MOFS) in this paper. Through MOFS, we select the different optimal feature subsets in order to eliminate features that are redundant or irrelevant and obtain optimal features.


Author(s):  
Hui Wang ◽  
Li Li Guo ◽  
Yun Lin

Automatic modulation recognition is very important for the receiver design in the broadband multimedia communication system, and the reasonable signal feature extraction and selection algorithm is the key technology of Digital multimedia signal recognition. In this paper, the information entropy is used to extract the single feature, which are power spectrum entropy, wavelet energy spectrum entropy, singular spectrum entropy and Renyi entropy. And then, the feature selection algorithm of distance measurement and Sequential Feature Selection(SFS) are presented to select the optimal feature subset. Finally, the BP neural network is used to classify the signal modulation. The simulation result shows that the four-different information entropy can be used to classify different signal modulation, and the feature selection algorithm is successfully used to choose the optimal feature subset and get the best performance.


2016 ◽  
Vol 2016 ◽  
pp. 1-10 ◽  
Author(s):  
Zhi Chen ◽  
Tao Lin ◽  
Ningjiu Tang ◽  
Xin Xia

The extensive applications of support vector machines (SVMs) require efficient method of constructing a SVM classifier with high classification ability. The performance of SVM crucially depends on whether optimal feature subset and parameter of SVM can be efficiently obtained. In this paper, a coarse-grained parallel genetic algorithm (CGPGA) is used to simultaneously optimize the feature subset and parameters for SVM. The distributed topology and migration policy of CGPGA can help find optimal feature subset and parameters for SVM in significantly shorter time, so as to increase the quality of solution found. In addition, a new fitness function, which combines the classification accuracy obtained from bootstrap method, the number of chosen features, and the number of support vectors, is proposed to lead the search of CGPGA to the direction of optimal generalization error. Experiment results on 12 benchmark datasets show that our proposed approach outperforms genetic algorithm (GA) based method and grid search method in terms of classification accuracy, number of chosen features, number of support vectors, and running time.


Feature selection in multispectral high dimensional information is a hard labour machine learning problem because of the imbalanced classes present in the data. The existing Most of the feature selection schemes in the literature ignore the problem of class imbalance by choosing the features from the classes having more instances and avoiding significant features of the classes having less instances. In this paper, SMOTE concept is exploited to produce the required samples form minority classes. Feature selection model is formulated with the objective of reducing number of features with improved classification performance. This model is based on dimensionality reduction by opt for a subset of relevant spectral, textural and spatial features while eliminating the redundant features for the purpose of improved classification performance. Binary ALO is engaged to solve the feature selection model for optimal selection of features. The proposed ALO-SVM with wrapper concept is applied to each potential solution obtained during optimization step. The working of this methodology is tested on LANDSAT multispectral image.


The optimal feature subset selection over very high dimensional data is a vital issue. Even though the optimal features are selected, the classification of those selected features becomes a key complicated task. In order to handle these problems, a novel, Accelerated Simulated Annealing and Mutation Operator (ASAMO) feature selection algorithm is suggested in this work. For solving the classification problem, the Fuzzy Minimal Consistent Class Subset Coverage (FMCCSC) problem is introduced. In FMCCSC, consistent subset is combined with the K-Nearest Neighbour (KNN) classifier known as FMCCSC-KNN classifier. The two data sets Dorothea and Madelon from UCI machine repository are experimented for optimal feature selection and classification. The experimental results substantiate the efficiency of proposed ASAMO with FMCCSC-KNN classifier compared to Particle Swarm Optimization (PSO) and Accelerated PSO feature selection algorithms.


Twitter sentiment analysis is a vital concept in determining the public opinions about products, services, events or personality. Analyzing the medical tweets on a specific topic can provide immense benefits in medical industry. However, the medical tweets require efficient feature selection approach to produce significantly accurate results. Penguin search optimization algorithm (PeSOA) has the ability to resolve NP-hard problems. This paper aims at developing an automated opinion mining framework by modeling the feature selection problem as NP-hard optimization problem and using PeSOA based feature selection approach to solve it. Initially, the medical tweets based on cancer and drugs keywords are extracted and pre-processed to filter the relevant informative tweets. Then the features are extracted based on the Natural Language Processing (NLP) concepts and the optimal features are selected using PeSOA whose results are fed as input to three baseline classifiers to achieve optimal and accurate sentiment classification. The experimental results obtained through MATLAB simulations on cancer and drug tweets using k-Nearest Neighbor (KNN), Naïve Bayes (NB) and Support Vector Machine (SVM) indicate that the proposed PeSOA feature selection based tweet opinion mining has improved the classification performance significantly. It shows that the PeSOA feature selection with the SVM classifier provides superior sentiment classification than the other classifiers


2020 ◽  
Vol 17 (5) ◽  
pp. 721-730
Author(s):  
Kamal Bashir ◽  
Tianrui Li ◽  
Mahama Yahaya

The most frequently used machine learning feature ranking approaches failed to present optimal feature subset for accurate prediction of defective software modules in out-of-sample data. Machine learning Feature Selection (FS) algorithms such as Chi-Square (CS), Information Gain (IG), Gain Ratio (GR), RelieF (RF) and Symmetric Uncertainty (SU) perform relatively poor at prediction, even after balancing class distribution in the training data. In this study, we propose a novel FS method based on the Maximum Likelihood Logistic Regression (MLLR). We apply this method on six software defect datasets in their sampled and unsampled forms to select useful features for classification in the context of Software Defect Prediction (SDP). The Support Vector Machine (SVM) and Random Forest (RaF) classifiers are applied on the FS subsets that are based on sampled and unsampled datasets. The performance of the models captured using Area Ander Receiver Operating Characteristics Curve (AUC) metrics are compared for all FS methods considered. The Analysis Of Variance (ANOVA) F-test results validate the superiority of the proposed method over all the FS techniques, both in sampled and unsampled data. The results confirm that the MLLR can be useful in selecting optimal feature subset for more accurate prediction of defective modules in software development process


2019 ◽  
Vol 29 (1) ◽  
pp. 1598-1610 ◽  
Author(s):  
Manosij Ghosh ◽  
Ritam Guha ◽  
Imran Alam ◽  
Priyank Lohariwal ◽  
Devesh Jalan ◽  
...  

Abstract Feature selection (FS) is a technique which helps to find the most optimal feature subset to develop an efficient pattern recognition model under consideration. The use of genetic algorithm (GA) and particle swarm optimization (PSO) in the field of FS is profound. In this paper, we propose an insightful way to perform FS by amassing information from the candidate solutions produced by GA and PSO. Our aim is to combine the exploitation ability of GA with the exploration capacity of PSO. We name this new model as binary genetic swarm optimization (BGSO). The proposed method initially lets GA and PSO to run independently. To extract sufficient information from the feature subsets obtained by those, BGSO combines their results by an algorithm called average weighted combination method to produce an intermediate solution. Thereafter, a local search called sequential one-point flipping is applied to refine the intermediate solution further in order to generate the final solution. BGSO is applied on 20 popular UCI datasets. The results were obtained by two classifiers, namely, k nearest neighbors (KNN) and multi-layer perceptron (MLP). The overall results and comparisons show that the proposed method outperforms the constituent algorithms in 16 and 14 datasets using KNN and MLP, respectively, whereas among the constituent algorithms, GA is able to achieve the best classification accuracy for 2 and 7 datasets and PSO achieves best accuracy for 2 and 4 datasets, respectively, for the same set of classifiers. This proves the applicability and usefulness of the method in the domain of FS.


Energies ◽  
2018 ◽  
Vol 11 (7) ◽  
pp. 1899 ◽  
Author(s):  
Lin Lin ◽  
Lin Xue ◽  
Zhiqiang Hu ◽  
Nantian Huang

To improve the accuracy of the day-ahead load forecasting predictions of a single model, a novel modular parallel forecasting model with feature selection was proposed. First, load features were extracted from a historic load with a horizon from the previous 24 h to the previous 168 h considering the calendar feature. Second, a feature selection combined with a predictor process was carried out to select the optimal feature for building a reliable predictor with respect to each hour. The final modular model consisted of 24 predictors with a respective optimal feature subset for day-ahead load forecasting. New England and Singapore load data were used to evaluate the effectiveness of the proposed method. The results indicated that the accuracy of the proposed modular model was higher than that of the traditional method. Furthermore, conducting a feature selection step when building a predictor improved the accuracy of load forecasting.


Sign in / Sign up

Export Citation Format

Share Document