A Many-Objective Simultaneous Feature Selection and Discretization for LCS-Based Gesture Recognition

Martin J.-D. Otis; Julien Vandewynckel

doi:10.3390/app11219787

A Many-Objective Simultaneous Feature Selection and Discretization for LCS-Based Gesture Recognition

Applied Sciences ◽

10.3390/app11219787 ◽

2021 ◽

Vol 11 (21) ◽

pp. 9787

Author(s):

Martin J.-D. Otis ◽

Julien Vandewynckel

Keyword(s):

Feature Selection ◽

Recognition Performance ◽

Reduction Rate ◽

Parameter Tuning ◽

Problem Formulation ◽

Feature Reduction ◽

Variable Length ◽

Feature Subset Selection ◽

Longest Common Subsequence ◽

Feature Subset

Discretization and feature selection are two relevant techniques for dimensionality reduction. The first one aims to transform a set of continuous attributes into discrete ones, and the second removes the irrelevant and redundant features; these two methods often lead to be more specific and concise data. In this paper, we propose to simultaneously deal with optimal feature subset selection, discretization, and classifier parameter tuning. As an illustration, the proposed problem formulation has been addressed using a constrained many-objective optimization algorithm based on dominance and decomposition (C-MOEA/DD) and a limited-memory implementation of the warping longest common subsequence algorithm (WarpingLCSS). In addition, the discretization sub-problem has been addressed using a variable-length representation, along with a variable-length crossover, to overcome the need of specifying the number of elements defining the discretization scheme in advance. We conduct experiments on a real-world benchmark dataset; compare two discretization criteria as discretization objective, namely Ameva and ur-CAIM; and analyze recognition performance and reduction capabilities. Our results show that our approach outperforms previous reported results by up to 11% and achieves an average feature reduction rate of 80%.

Download Full-text

Improved TLBO-JAYA Algorithm for Subset Feature Selection and Parameter Optimisation in Intrusion Detection System

Complexity ◽

10.1155/2020/5287684 ◽

2020 ◽

Vol 2020 ◽

pp. 1-18 ◽

Cited By ~ 1

Author(s):

Mohammad Aljanabi ◽

Mohd Arfian Ismail ◽

Vitaly Mezhuyev

Keyword(s):

Support Vector Machine ◽

Feature Selection ◽

Intrusion Detection ◽

Intrusion Detection System ◽

Detection System ◽

Parameter Tuning ◽

Feature Subset Selection ◽

Supervised Machine Learning ◽

Support Vector ◽

Feature Subset

Many optimisation-based intrusion detection algorithms have been developed and are widely used for intrusion identification. This condition is attributed to the increasing number of audit data features and the decreasing performance of human-based smart intrusion detection systems regarding classification accuracy, false alarm rate, and classification time. Feature selection and classifier parameter tuning are important factors that affect the performance of any intrusion detection system. In this paper, an improved intrusion detection algorithm for multiclass classification was presented and discussed in detail. The proposed method combined the improved teaching-learning-based optimisation (ITLBO) algorithm, improved parallel JAYA (IPJAYA) algorithm, and support vector machine. ITLBO with supervised machine learning (ML) technique was used for feature subset selection (FSS). The selection of the least number of features without causing an effect on the result accuracy in FSS is a multiobjective optimisation problem. This work proposes ITLBO as an FSS mechanism, and its algorithm-specific, parameterless concept (no parameter tuning is required during optimisation) was explored. IPJAYA in this study was used to update the C and gamma parameters of the support vector machine (SVM). Several experiments were performed on the prominent intrusion ML dataset, where significant enhancements were observed with the suggested ITLBO-IPJAYA-SVM algorithm compared with the classical TLBO and JAYA algorithms.

Download Full-text

A novel feature selection algorithm based on damping oscillation theory

PLoS ONE ◽

10.1371/journal.pone.0255307 ◽

2021 ◽

Vol 16 (8) ◽

pp. e0255307

Author(s):

Fujun Wang ◽

Xing Wang

Keyword(s):

Feature Selection ◽

Optimization Algorithm ◽

Euclidean Distance ◽

Oscillation Theory ◽

Feature Subset Selection ◽

Support Vector ◽

Data Sets ◽

Feature Subset ◽

Selection Algorithm ◽

Filter Model

Feature selection is an important task in big data analysis and information retrieval processing. It reduces the number of features by removing noise, extraneous data. In this paper, one feature subset selection algorithm based on damping oscillation theory and support vector machine classifier is proposed. This algorithm is called the Maximum Kendall coefficient Maximum Euclidean Distance Improved Gray Wolf Optimization algorithm (MKMDIGWO). In MKMDIGWO, first, a filter model based on Kendall coefficient and Euclidean distance is proposed, which is used to measure the correlation and redundancy of the candidate feature subset. Second, the wrapper model is an improved grey wolf optimization algorithm, in which its position update formula has been improved in order to achieve optimal results. Third, the filter model and the wrapper model are dynamically adjusted by the damping oscillation theory to achieve the effect of finding an optimal feature subset. Therefore, MKMDIGWO achieves both the efficiency of the filter model and the high precision of the wrapper model. Experimental results on five UCI public data sets and two microarray data sets have demonstrated the higher classification accuracy of the MKMDIGWO algorithm than that of other four state-of-the-art algorithms. The maximum ACC value of the MKMDIGWO algorithm is at least 0.5% higher than other algorithms on 10 data sets.

Download Full-text

Liver Cancer Classification Model Using Hybrid Feature Selection Based on Class-Dependent Technique for the Central Region of Thailand

Information ◽

10.3390/info10060187 ◽

2019 ◽

Vol 10 (6) ◽

pp. 187

Author(s):

Rattanawadee Panthong ◽

Anongnart Srivihok

Keyword(s):

Feature Selection ◽

Liver Cancer ◽

Predictive Model ◽

Information Gain ◽

Classification Performance ◽

Cancer Classification ◽

Feature Subset Selection ◽

Classification Model ◽

Feature Subset ◽

Cancer Data

Liver cancer data always consist of a large number of multidimensional datasets. A dataset that has huge features and multiple classes may be irrelevant to the pattern classification in machine learning. Hence, feature selection improves the performance of the classification model to achieve maximum classification accuracy. The aims of the present study were to find the best feature subset and to evaluate the classification performance of the predictive model. This paper proposed a hybrid feature selection approach by combining information gain and sequential forward selection based on the class-dependent technique (IGSFS-CD) for the liver cancer classification model. Two different classifiers (decision tree and naïve Bayes) were used to evaluate feature subsets. The liver cancer datasets were obtained from the Cancer Hospital Thailand database. Three ensemble methods (ensemble classifiers, bagging, and AdaBoost) were applied to improve the performance of classification. The IGSFS-CD method provided good accuracy of 78.36% (sensitivity 0.7841 and specificity 0.9159) on LC_dataset-1. In addition, LC_dataset II delivered the best performance with an accuracy of 84.82% (sensitivity 0.8481 and specificity 0.9437). The IGSFS-CD method achieved better classification performance compared to the class-independent method. Furthermore, the best feature subset selection could help reduce the complexity of the predictive model.

Download Full-text

A New Hybrid Feature Subset Selection Framework Based on Binary Genetic Algorithm and Information Theory

International Journal of Computational Intelligence and Applications ◽

10.1142/s1469026819500202 ◽

2019 ◽

Vol 18 (03) ◽

pp. 1950020 ◽

Cited By ~ 13

Author(s):

Alok Kumar Shukla ◽

Pradeep Singh ◽

Manu Vardhan

Keyword(s):

Genetic Algorithm ◽

Feature Selection ◽

Classification Accuracy ◽

B Cell Lymphoma ◽

Feature Subset Selection ◽

Classification Model ◽

Significant Feature ◽

Support Vector ◽

Feature Subset ◽

Binary Genetic Algorithm

The explosion of the high-dimensional dataset in the scientific repository has been encouraging interdisciplinary research on data mining, pattern recognition and bioinformatics. The fundamental problem of the individual Feature Selection (FS) method is extracting informative features for classification model and to seek for the malignant disease at low computational cost. In addition, existing FS approaches overlook the fact that for a given cardinality, there can be several subsets with similar information. This paper introduces a novel hybrid FS algorithm, called Filter-Wrapper Feature Selection (FWFS) for a classification problem and also addresses the limitations of existing methods. In the proposed model, the front-end filter ranking method as Conditional Mutual Information Maximization (CMIM) selects the high ranked feature subset while the succeeding method as Binary Genetic Algorithm (BGA) accelerates the search in identifying the significant feature subsets. One of the merits of the proposed method is that, unlike an exhaustive method, it speeds up the FS procedure without lancing of classification accuracy on reduced dataset when a learning model is applied to the selected subsets of features. The efficacy of the proposed (FWFS) method is examined by Naive Bayes (NB) classifier which works as a fitness function. The effectiveness of the selected feature subset is evaluated using numerous classifiers on five biological datasets and five UCI datasets of a varied dimensionality and number of instances. The experimental results emphasize that the proposed method provides additional support to the significant reduction of the features and outperforms the existing methods. For microarray data-sets, we found the lowest classification accuracy is 61.24% on SRBCT dataset and highest accuracy is 99.32% on Diffuse large B-cell lymphoma (DLBCL). In UCI datasets, the lowest classification accuracy is 40.04% on the Lymphography using k-nearest neighbor (k-NN) and highest classification accuracy is 99.05% on the ionosphere using support vector machine (SVM).

Download Full-text

Addressing Low Dimensionality Feature Subset Selection: ReliefF(-k) or Extended Correlation-Based Feature Selection(eCFS)?

Advances in Intelligent Systems and Computing - 14th International Conference on Soft Computing Models in Industrial and Environmental Applications (SOCO 2019) ◽

10.1007/978-3-030-20055-8_24 ◽

2019 ◽

pp. 251-260 ◽

Cited By ~ 1

Author(s):

Antonio J. Tallón-Ballesteros ◽

Luís Cavique ◽

Simon Fong

Keyword(s):

Feature Selection ◽

Subset Selection ◽

Feature Subset Selection ◽

Feature Subset ◽

Low Dimensionality ◽

Correlation Based Feature Selection

Download Full-text

Feature Selection Based on Binary Tree Growth Algorithm for the Classification of Myoelectric Signals

Machines ◽

10.3390/machines6040065 ◽

2018 ◽

Vol 6 (4) ◽

pp. 65 ◽

Cited By ~ 4

Author(s):

Jingwei Too ◽

Abdul Abdullah ◽

Norhashimah Mohd Saad ◽

Nursabillilah Mohd Ali

Keyword(s):

Feature Selection ◽

Tree Growth ◽

Binary Tree ◽

Feature Vector ◽

Classification Performance ◽

Feature Reduction ◽

Feature Subset ◽

Selection Methods ◽

Time Frequency ◽

Mutation Operators

Electromyography (EMG) has been widely used in rehabilitation and myoelectric prosthetic applications. However, a recent increment in the number of EMG features has led to a high dimensional feature vector. This in turn will degrade the classification performance and increase the complexity of the recognition system. In this paper, we have proposed two new feature selection methods based on a tree growth algorithm (TGA) for EMG signals classification. In the first approach, two transfer functions are implemented to convert the continuous TGA into a binary version. For the second approach, the swap, crossover, and mutation operators are introduced in a modified binary tree growth algorithm for enhancing the exploitation and exploration behaviors. In this study, short time Fourier transform (STFT) is employed to transform the EMG signals into time-frequency representation. The features are then extracted from the STFT coefficient and form a feature vector. Afterward, the proposed feature selection methods are applied to evaluate the best feature subset from a large available feature set. The experimental results show the superiority of MBTGA not only in terms of feature reduction, but also the classification performance.

Download Full-text

Angle Modulated Artificial Bee Colony Algorithms for Feature Selection

Applied Computational Intelligence and Soft Computing ◽

10.1155/2016/9569161 ◽

2016 ◽

Vol 2016 ◽

pp. 1-6 ◽

Cited By ~ 7

Author(s):

Gürcan Yavuz ◽

Doğan Aydin

Keyword(s):

Feature Selection ◽

Artificial Bee Colony ◽

Continuous Optimization ◽

Subset Selection ◽

Machine Intelligence ◽

Feature Subset Selection ◽

High Dimensional ◽

Feature Subset ◽

Bee Colony ◽

Angle Modulation

Optimal feature subset selection is an important and a difficult task for pattern classification, data mining, and machine intelligence applications. The objective of the feature subset selection is to eliminate the irrelevant and noisy feature in order to select optimum feature subsets and increase accuracy. The large number of features in a dataset increases the computational complexity thus leading to performance degradation. In this paper, to overcome this problem, angle modulation technique is used to reduce feature subset selection problem to four-dimensional continuous optimization problem instead of presenting the problem as a high-dimensional bit vector. To present the effectiveness of the problem presentation with angle modulation and to determine the efficiency of the proposed method, six variants of Artificial Bee Colony (ABC) algorithms employ angle modulation for feature selection. Experimental results on six high-dimensional datasets show that Angle Modulated ABC algorithms improved the classification accuracy with fewer feature subsets.

Download Full-text

Optimal Feature Subset Selection for Imbalanced Class Data using SMOTE and Binary ALO Algorithm

International Journal of Engineering and Advanced Technology - Regular Issue ◽

10.35940/ijeat.c4734.029320 ◽

2020 ◽

Vol 9 (3) ◽

pp. 344-349

Keyword(s):

Feature Selection ◽

Class Imbalance ◽

Classification Performance ◽

Selection Model ◽

Feature Subset Selection ◽

Feature Subset ◽

Spatial Features ◽

Imbalanced Classes ◽

Optimal Feature Subset ◽

Optimal Feature

Feature selection in multispectral high dimensional information is a hard labour machine learning problem because of the imbalanced classes present in the data. The existing Most of the feature selection schemes in the literature ignore the problem of class imbalance by choosing the features from the classes having more instances and avoiding significant features of the classes having less instances. In this paper, SMOTE concept is exploited to produce the required samples form minority classes. Feature selection model is formulated with the objective of reducing number of features with improved classification performance. This model is based on dimensionality reduction by opt for a subset of relevant spectral, textural and spatial features while eliminating the redundant features for the purpose of improved classification performance. Binary ALO is engaged to solve the feature selection model for optimal selection of features. The proposed ALO-SVM with wrapper concept is applied to each potential solution obtained during optimization step. The working of this methodology is tested on LANDSAT multispectral image.

Download Full-text

A Hybrid Feature Selection Method for Improve the Accuracy of Medical Classification Process

International Journal of Innovative Technology and Exploring Engineering - Special Issue ◽

10.35940/ijitee.a9624.1111121 ◽

2021 ◽

Vol 11 (1) ◽

pp. 50-55

Author(s):

Maria Mohammad Yousef ◽

Keyword(s):

Machine Learning ◽

Feature Selection ◽

Dimensionality Reduction ◽

Classification Accuracy ◽

Fitness Function ◽

Machine Learning Algorithms ◽

Feature Subset Selection ◽

High Dimensionality ◽

Support Vector ◽

Feature Subset

Generally, medical dataset classification has become one of the biggest problems in data mining research. Every database has a given number of features but it is observed that some of these features can be redundant and can be harmful as well as disrupt the process of classification and this problem is known as a high dimensionality problem. Dimensionality reduction in data preprocessing is critical for increasing the performance of machine learning algorithms. Besides the contribution of feature subset selection in dimensionality reduction gives a significant improvement in classification accuracy. In this paper, we proposed a new hybrid feature selection approach based on (GA assisted by KNN) to deal with issues of high dimensionality in biomedical data classification. The proposed method first applies the combination between GA and KNN for feature selection to find the optimal subset of features where the classification accuracy of the k-Nearest Neighbor (kNN) method is used as the fitness function for GA. After selecting the best-suggested subset of features, Support Vector Machine (SVM) are used as the classifiers. The proposed method experiments on five medical datasets of the UCI Machine Learning Repository. It is noted that the suggested technique performs admirably on these databases, achieving higher classification accuracy while using fewer features.

Download Full-text

Accelerated Simulated Annealing and Mutation Operator Feature Selection method for Big Data

International Journal of Recent Technology and Engineering - 2 ◽

10.35940/ijrte.b1712.078219 ◽

2019 ◽

Vol 8 (2) ◽

pp. 910-916

Keyword(s):

Feature Selection ◽

Simulated Annealing ◽

Feature Selection Method ◽

Classification Problem ◽

Feature Subset Selection ◽

Feature Subset ◽

Mutation Operator ◽

Knn Classifier ◽

Optimal Feature Subset ◽

Optimal Feature

The optimal feature subset selection over very high dimensional data is a vital issue. Even though the optimal features are selected, the classification of those selected features becomes a key complicated task. In order to handle these problems, a novel, Accelerated Simulated Annealing and Mutation Operator (ASAMO) feature selection algorithm is suggested in this work. For solving the classification problem, the Fuzzy Minimal Consistent Class Subset Coverage (FMCCSC) problem is introduced. In FMCCSC, consistent subset is combined with the K-Nearest Neighbour (KNN) classifier known as FMCCSC-KNN classifier. The two data sets Dorothea and Madelon from UCI machine repository are experimented for optimal feature selection and classification. The experimental results substantiate the efficiency of proposed ASAMO with FMCCSC-KNN classifier compared to Particle Swarm Optimization (PSO) and Accelerated PSO feature selection algorithms.

Download Full-text