Hybrid binary Butterfly Optimization algorithm and Simulated Annealing for Feature Selection Problem

Feature selection is performed to eliminate irrelevant features to reduce computational overheads. Metaheuristic algorithms have become popular for the task of feature selection due to their effectiveness and flexibility. Hybridization of two or more such metaheuristics has become popular in solving optimization problems. In this paper, we propose a hybrid wrapper feature selection technique based on binary butterfly optimization algorithm (bBOA) and Simulated Annealing (SA). The SA is combined with the bBOA in a pipeline fashion such that the best solution obtained by the bBOA is passed on to the SA for further improvement. The SA solution improves the best solution obtained so far by searching in its neighborhood. Thus the SA tries to enhance the exploitation property of the bBOA. The proposed method is tested on twenty datasets from the UCI repository and the results are compared with five popular algorithms for feature selection. The results confirm the effectiveness of the hybrid approach in improving the classification accuracy and selecting the optimal feature subset.

Download Full-text

Accelerated Simulated Annealing and Mutation Operator Feature Selection method for Big Data

International Journal of Recent Technology and Engineering - 2 ◽

10.35940/ijrte.b1712.078219 ◽

2019 ◽

Vol 8 (2) ◽

pp. 910-916

Keyword(s):

Feature Selection ◽

Simulated Annealing ◽

Feature Selection Method ◽

Classification Problem ◽

Feature Subset Selection ◽

Feature Subset ◽

Mutation Operator ◽

Knn Classifier ◽

Optimal Feature Subset ◽

Optimal Feature

The optimal feature subset selection over very high dimensional data is a vital issue. Even though the optimal features are selected, the classification of those selected features becomes a key complicated task. In order to handle these problems, a novel, Accelerated Simulated Annealing and Mutation Operator (ASAMO) feature selection algorithm is suggested in this work. For solving the classification problem, the Fuzzy Minimal Consistent Class Subset Coverage (FMCCSC) problem is introduced. In FMCCSC, consistent subset is combined with the K-Nearest Neighbour (KNN) classifier known as FMCCSC-KNN classifier. The two data sets Dorothea and Madelon from UCI machine repository are experimented for optimal feature selection and classification. The experimental results substantiate the efficiency of proposed ASAMO with FMCCSC-KNN classifier compared to Particle Swarm Optimization (PSO) and Accelerated PSO feature selection algorithms.

Download Full-text

Product Review Based Customer Sentiment Analysis using an Ensemble of mRMR and Forest Optimization Algorithm (FOA)

International Journal of Applied Metaheuristic Computing ◽

10.4018/ijamc.2022010107 ◽

2022 ◽

Vol 13 (1) ◽

pp. 0-0

Keyword(s):

Feature Selection ◽

Sentiment Analysis ◽

Optimization Algorithm ◽

Nearest Neighbor ◽

Hybrid Approach ◽

Support Vector ◽

K Nearest Neighbor ◽

Feature Selection Technique ◽

Feature Selection Problem

This research presents a way of feature selection problem for classification of sentiments that use ensemble-based classifier. This includes a hybrid approach of minimum redundancy and maximum relevance (mRMR) technique and Forest Optimization Algorithm (FOA) (i.e. mRMR-FOA) based feature selection. Before applying the FOA on sentiment analysis, it has been used as feature selection technique applied on 10 different classification datasets publically available on UCI machine learning repository. The classifiers for example k-Nearest Neighbor (k-NN), Support Vector Machine (SVM) and Naïve Bayes used the ensemble based algorithm for available datasets. The mRMR-FOA uses the Blitzer’s dataset (customer reviews on electronic products survey) to select the significant features. The classification of sentiments has noticed to improve by 12 to 18%. The evaluated results are further enhanced by the ensemble of k-NN, NB and SVM with an accuracy of 88.47% for the classification of sentiment analysis task.

Download Full-text

PREDAIP: Computational Prediction and Analysis for Anti-inflammatory Peptide via a Hybrid Feature Selection Technique

Current Bioinformatics ◽

10.2174/1574893616666210601111157 ◽

2021 ◽

Vol 16 ◽

Author(s):

Dan Lin ◽

Jialin Yu ◽

Ju Zhang ◽

Huan He ◽

Xinyun Guo ◽

...

Keyword(s):

Machine Learning ◽

Feature Selection ◽

Machine Learning Algorithms ◽

Selection Strategy ◽

Feature Subset ◽

Feature Selection Technique ◽

Selection Technique ◽

Anti Inflammatory ◽

Optimal Feature Subset ◽

Optimal Feature

Background: Anti-inflammatory peptides (AIPs) are potent therapeutic agents for inflammatory and autoimmune disorders due to their high specificity and minimal toxicity under normal conditions. Therefore, it is greatly significant and beneficial to identify AIPs for further discovering novel and efficient AIPs-based therapeutics. Recently, three computational approaches, which can effectively identify potential AIPs, have been developed based on machine learning algorithms. However, there are several challenges with the existing three predictors. Objective: A novel machine learning algorithm needs to be proposed to improve the AIPs prediction accuracy. Methods: This study attempts to improve the recognition of AIPs by employing multiple primary sequence-based feature descriptors and an efficient feature selection strategy. By sorting features through four enhanced minimal redundancy maximal relevance (emRMR) methods, and then attaching seven different classifiers wrapper methods based on the sequential forward selection algorithm (SFS), we proposed a hybrid feature selection technique emRMR-SFS to optimize feature vectors. Furthermore, by evaluating seven classifiers trained with the optimal feature subset, we developed the extremely randomized tree (ERT) based predictor named PREDAIP for identifying AIPs. Results: We systematically compared the performance of PREDAIP with the existing tools on an independent test dataset. It demonstrates the effectiveness and power of the PREDAIP. The correlation criteria used in emRMR would affect the selection results of the optimal feature subset at the SFS-wrapper stage, which justifies the necessity for considering different correlation criteria in emRMR. Conclusion: We expect that PREDAIP will be useful for the high-throughput prediction of AIPs and the development of AIPs therapeutics.

Download Full-text

Modulation Recognition of Digital Multimedia Signal Based on Data Feature Selection

International Journal of Mobile Computing and Multimedia Communications ◽

10.4018/ijmcmc.2017070107 ◽

2017 ◽

Vol 8 (3) ◽

pp. 90-111 ◽

Cited By ~ 2

Author(s):

Hui Wang ◽

Li Li Guo ◽

Yun Lin

Keyword(s):

Feature Selection ◽

Information Entropy ◽

Feature Subset ◽

Selection Algorithm ◽

Feature Selection Algorithm ◽

Modulation Recognition ◽

Signal Modulation ◽

Digital Multimedia ◽

Optimal Feature Subset ◽

Optimal Feature

Automatic modulation recognition is very important for the receiver design in the broadband multimedia communication system, and the reasonable signal feature extraction and selection algorithm is the key technology of Digital multimedia signal recognition. In this paper, the information entropy is used to extract the single feature, which are power spectrum entropy, wavelet energy spectrum entropy, singular spectrum entropy and Renyi entropy. And then, the feature selection algorithm of distance measurement and Sequential Feature Selection(SFS) are presented to select the optimal feature subset. Finally, the BP neural network is used to classify the signal modulation. The simulation result shows that the four-different information entropy can be used to classify different signal modulation, and the feature selection algorithm is successfully used to choose the optimal feature subset and get the best performance.

Download Full-text

Optimal Feature Subset Selection for Imbalanced Class Data using SMOTE and Binary ALO Algorithm

International Journal of Engineering and Advanced Technology - Regular Issue ◽

10.35940/ijeat.c4734.029320 ◽

2020 ◽

Vol 9 (3) ◽

pp. 344-349

Keyword(s):

Feature Selection ◽

Class Imbalance ◽

Classification Performance ◽

Selection Model ◽

Feature Subset Selection ◽

Feature Subset ◽

Spatial Features ◽

Imbalanced Classes ◽

Optimal Feature Subset ◽

Optimal Feature

Feature selection in multispectral high dimensional information is a hard labour machine learning problem because of the imbalanced classes present in the data. The existing Most of the feature selection schemes in the literature ignore the problem of class imbalance by choosing the features from the classes having more instances and avoiding significant features of the classes having less instances. In this paper, SMOTE concept is exploited to produce the required samples form minority classes. Feature selection model is formulated with the objective of reducing number of features with improved classification performance. This model is based on dimensionality reduction by opt for a subset of relevant spectral, textural and spatial features while eliminating the redundant features for the purpose of improved classification performance. Binary ALO is engaged to solve the feature selection model for optimal selection of features. The proposed ALO-SVM with wrapper concept is applied to each potential solution obtained during optimization step. The working of this methodology is tested on LANDSAT multispectral image.

Download Full-text

A Novel Feature Selection Method Based on Maximum Likelihood Logistic Regression for Imbalanced Learning in Software Defect Prediction

The International Arab Journal of Information Technology ◽

10.34028/iajit/17/5/5 ◽

2020 ◽

Vol 17 (5) ◽

pp. 721-730

Author(s):

Kamal Bashir ◽

Tianrui Li ◽

Mahama Yahaya

Keyword(s):

Machine Learning ◽

Logistic Regression ◽

Feature Selection ◽

Maximum Likelihood ◽

Defect Prediction ◽

Feature Subset ◽

Software Defect Prediction ◽

Software Defect ◽

Optimal Feature Subset ◽

Optimal Feature

The most frequently used machine learning feature ranking approaches failed to present optimal feature subset for accurate prediction of defective software modules in out-of-sample data. Machine learning Feature Selection (FS) algorithms such as Chi-Square (CS), Information Gain (IG), Gain Ratio (GR), RelieF (RF) and Symmetric Uncertainty (SU) perform relatively poor at prediction, even after balancing class distribution in the training data. In this study, we propose a novel FS method based on the Maximum Likelihood Logistic Regression (MLLR). We apply this method on six software defect datasets in their sampled and unsampled forms to select useful features for classification in the context of Software Defect Prediction (SDP). The Support Vector Machine (SVM) and Random Forest (RaF) classifiers are applied on the FS subsets that are based on sampled and unsampled datasets. The performance of the models captured using Area Ander Receiver Operating Characteristics Curve (AUC) metrics are compared for all FS methods considered. The Analysis Of Variance (ANOVA) F-test results validate the superiority of the proposed method over all the FS techniques, both in sampled and unsampled data. The results confirm that the MLLR can be useful in selecting optimal feature subset for more accurate prediction of defective modules in software development process

Download Full-text

Binary Genetic Swarm Optimization: A Combination of GA and PSO for Feature Selection

Journal of Intelligent Systems ◽

10.1515/jisys-2019-0062 ◽

2019 ◽

Vol 29 (1) ◽

pp. 1598-1610 ◽

Cited By ~ 5

Author(s):

Manosij Ghosh ◽

Ritam Guha ◽

Imran Alam ◽

Priyank Lohariwal ◽

Devesh Jalan ◽

...

Keyword(s):

Feature Selection ◽

Combination Method ◽

Sufficient Information ◽

Feature Subset ◽

Final Solution ◽

Swarm Optimization ◽

Intermediate Solution ◽

Optimal Feature Subset ◽

Optimal Feature ◽

Weighted Combination

Abstract Feature selection (FS) is a technique which helps to find the most optimal feature subset to develop an efficient pattern recognition model under consideration. The use of genetic algorithm (GA) and particle swarm optimization (PSO) in the field of FS is profound. In this paper, we propose an insightful way to perform FS by amassing information from the candidate solutions produced by GA and PSO. Our aim is to combine the exploitation ability of GA with the exploration capacity of PSO. We name this new model as binary genetic swarm optimization (BGSO). The proposed method initially lets GA and PSO to run independently. To extract sufficient information from the feature subsets obtained by those, BGSO combines their results by an algorithm called average weighted combination method to produce an intermediate solution. Thereafter, a local search called sequential one-point flipping is applied to refine the intermediate solution further in order to generate the final solution. BGSO is applied on 20 popular UCI datasets. The results were obtained by two classifiers, namely, k nearest neighbors (KNN) and multi-layer perceptron (MLP). The overall results and comparisons show that the proposed method outperforms the constituent algorithms in 16 and 14 datasets using KNN and MLP, respectively, whereas among the constituent algorithms, GA is able to achieve the best classification accuracy for 2 and 7 datasets and PSO achieves best accuracy for 2 and 4 datasets, respectively, for the same set of classifiers. This proves the applicability and usefulness of the method in the domain of FS.

Download Full-text

Modular Predictor for Day-Ahead Load Forecasting and Feature Selection for Different Hours

Energies ◽

10.3390/en11071899 ◽

2018 ◽

Vol 11 (7) ◽

pp. 1899 ◽

Cited By ~ 3

Author(s):

Lin Lin ◽

Lin Xue ◽

Zhiqiang Hu ◽

Nantian Huang

Keyword(s):

Feature Selection ◽

New England ◽

Load Forecasting ◽

Feature Subset ◽

Single Model ◽

Modular Model ◽

Selection Step ◽

Selection For ◽

Optimal Feature Subset ◽

Optimal Feature

To improve the accuracy of the day-ahead load forecasting predictions of a single model, a novel modular parallel forecasting model with feature selection was proposed. First, load features were extracted from a historic load with a horizon from the previous 24 h to the previous 168 h considering the calendar feature. Second, a feature selection combined with a predictor process was carried out to select the optimal feature for building a reliable predictor with respect to each hour. The final modular model consisted of 24 predictors with a respective optimal feature subset for day-ahead load forecasting. New England and Singapore load data were used to evaluate the effectiveness of the proposed method. The results indicated that the accuracy of the proposed modular model was higher than that of the traditional method. Furthermore, conducting a feature selection step when building a predictor improved the accuracy of load forecasting.

Download Full-text

AN OPTIMAL FEATURE SUBSET SELECTION METHOD BASED ON DISTANCE DISCRIMINANT AND DISTRIBUTION OVERLAPPING

International Journal of Pattern Recognition and Artificial Intelligence ◽

10.1142/s0218001409007715 ◽

2009 ◽

Vol 23 (08) ◽

pp. 1577-1597 ◽

Cited By ~ 5

Author(s):

JIANNING LIANG ◽

SU YANG ◽

YUANYUAN WANG

Keyword(s):

Feature Selection ◽

Computational Cost ◽

Difficult Problem ◽

Exhaustive Search ◽

Feature Subset Selection ◽

Search Problem ◽

Feature Subset ◽

Ranking Problem ◽

Optimal Feature Subset ◽

Optimal Feature

The goal of feature selection is to search the optimal feature subset with respect to the evaluation function. Exhaustively searching all possible feature subsets requires high computational cost. The alternative suboptimal methods are more efficient and practical but they cannot promise globally optimal results. We propose a new feature selection algorithm based on distance discriminant and distribution overlapping (HFSDD) for continuous features, which overcomes the drawbacks of the exhaustive search approaches and those of the suboptimal methods. The proposed method is able to find the optimal feature subset without exhaustive search or Branch and Bound algorithm. The most difficult problem for optimal feature selection, the search problem, is converted into a feature ranking problem following rigorous theoretical proof such that the computational complexity can be greatly reduced. Since the distribution of overlapping degrees between every two classes can provide useful information for feature selection, HFSDD also takes them into account by using a new approach to estimate the overlapping degrees. In this sense, HFSDD is a distance discriminant and distribution overlapping based solution. HFSDD was compared with ReliefF and mrmrMID on ten data sets. The experimental results show that HFSDD outperforms the other methods.

Download Full-text

Gradient-Based Multi-Objective Feature Selection for Gait Mode Recognition of Transfemoral Amputees

10.20944/preprints201811.0094.v1 ◽

2018 ◽

Author(s):

Gholamreza Khademi ◽

Hanieh Mohammadi ◽

Dan Simon

Keyword(s):

Feature Selection ◽

Feature Subset Selection ◽

Feature Subset ◽

User Intent ◽

Multi Objective ◽

Mode Recognition ◽

Gradient Based ◽

Transfemoral Amputees ◽

Optimal Feature Subset ◽

Optimal Feature

One control challenge in prosthetic legs is seamless transition from one gait mode to another. User intent recognition (UIR) is a high-level controller that tells a low-level controller to switch to the identified activity mode, depending on the user’s intent and environment. We propose a new framework to design an optimal UIR system with simultaneous maximum performance and parsimony for gait mode recognition. We use multi-objective optimization (MOO) to find an optimal feature subset that creates a trade-off between these two conflicting objectives. The main contribution of this paper is two-fold: (1) a new gradient-based multi-objective feature selection (GMOFS) method for optimal UIR design; and (2) the application of advanced evolutionary MOO methods for UIR. GMOFS is an embedded method that simultaneously performs feature selection and classification by incorporating an elastic net in multilayer perceptron neural network training. Experimental data are collected from six subjects, including three able-bodied subjects and three transfemoral amputees. We implement GMOFS and four variants of multi-objective biogeography-based optimization (MOBBO) for optimal feature subset selection, and we compare their performances using normalized hypervolume and relative coverage. GMOFS demonstrates competitive performance compared to the four MOBBO methods. We achieve a mean classification accuracy of 97.14% ± 1.51% and 98.45% ± 1.22% with the optimal selected subset for able-bodied and amputee subjects, respectively, while using only 23% of the available features. Results thus indicate the potential of advanced optimization methods to simultaneously achieve accurate, reliable, and compact UIR for locomotion mode detection of lower-limb amputees with prostheses.

Download Full-text