A novel feature selection algorithm based on damping oscillation theory

Feature selection is an important task in big data analysis and information retrieval processing. It reduces the number of features by removing noise, extraneous data. In this paper, one feature subset selection algorithm based on damping oscillation theory and support vector machine classifier is proposed. This algorithm is called the Maximum Kendall coefficient Maximum Euclidean Distance Improved Gray Wolf Optimization algorithm (MKMDIGWO). In MKMDIGWO, first, a filter model based on Kendall coefficient and Euclidean distance is proposed, which is used to measure the correlation and redundancy of the candidate feature subset. Second, the wrapper model is an improved grey wolf optimization algorithm, in which its position update formula has been improved in order to achieve optimal results. Third, the filter model and the wrapper model are dynamically adjusted by the damping oscillation theory to achieve the effect of finding an optimal feature subset. Therefore, MKMDIGWO achieves both the efficiency of the filter model and the high precision of the wrapper model. Experimental results on five UCI public data sets and two microarray data sets have demonstrated the higher classification accuracy of the MKMDIGWO algorithm than that of other four state-of-the-art algorithms. The maximum ACC value of the MKMDIGWO algorithm is at least 0.5% higher than other algorithms on 10 data sets.

Download Full-text

Ensemble Based Classification of Sentiments Using Forest Optimization Algorithm

Data ◽

10.3390/data4020076 ◽

2019 ◽

Vol 4 (2) ◽

pp. 76 ◽

Cited By ~ 2

Author(s):

Mehreen Naz ◽

Kashif Zafar ◽

Ayesha Khan

Keyword(s):

Feature Selection ◽

Optimization Algorithm ◽

Subset Selection ◽

Feature Subset Selection ◽

Selection Problem ◽

Support Vector ◽

Feature Subset ◽

Hybrid Technique ◽

Computational Performance

Feature subset selection is a process to choose a set of relevant features from a high dimensionality dataset to improve the performance of classifiers. The meaningful words extracted from data forms a set of features for sentiment analysis. Many evolutionary algorithms, like the Genetic Algorithm (GA) and Particle Swarm Optimization (PSO), have been applied to feature subset selection problem and computational performance can still be improved. This research presents a solution to feature subset selection problem for classification of sentiments using ensemble-based classifiers. It consists of a hybrid technique of minimum redundancy and maximum relevance (mRMR) and Forest Optimization Algorithm (FOA)-based feature selection. Ensemble-based classification is implemented to optimize the results of individual classifiers. The Forest Optimization Algorithm as a feature selection technique has been applied to various classification datasets from the UCI machine learning repository. The classifiers used for ensemble methods for UCI repository datasets are the k-Nearest Neighbor (k-NN) and Naïve Bayes (NB). For the classification of sentiments, 15–20% improvement has been recorded. The dataset used for classification of sentiments is Blitzer’s dataset consisting of reviews of electronic products. The results are further improved by ensemble of k-NN, NB, and Support Vector Machine (SVM) with an accuracy of 95% for the classification of sentiment tasks.

Download Full-text

Improved Intrusion Detection Algorithm based on TLBO and GA Algorithms

The International Arab Journal of Information Technology ◽

10.34028/iajit/18/2/5 ◽

2021 ◽

Vol 18 (2) ◽

Keyword(s):

Machine Learning ◽

Intrusion Detection ◽

Optimization Algorithm ◽

Feature Subset Selection ◽

Supervised Machine Learning ◽

Machine Learning Techniques ◽

Support Vector ◽

Feature Subset ◽

Teaching Learning Based Optimization ◽

Teaching Learning

Optimization algorithms are widely used for the identification of intrusion. This is attributable to the increasing number of audit data features and the decreasing performance of human-based smart Intrusion Detection Systems (IDS) regarding classification accuracy and training time. In this paper, an improved method for intrusion detection for binary classification was presented and discussed in detail. The proposed method combined the New Teaching-Learning-Based Optimization Algorithm (NTLBO), Support Vector Machine (SVM), Extreme Learning Machine (ELM), and Logistic Regression (LR) (feature selection and weighting) NTLBO algorithm with supervised machine learning techniques for Feature Subset Selection (FSS). The process of selecting the least number of features without any effect on the result accuracy in FSS was considered a multi-objective optimization problem. The NTLBO was proposed in this paper as an FSS mechanism; its algorithm-specific, parameter-less concept (which requires no parameter tuning during an optimization) was explored. The experiments were performed on the prominent intrusion machine-learning datasets (KDDCUP’99 and CICIDS 2017), where significant enhancements were observed with the suggested NTLBO algorithm as compared to the classical Teaching-Learning-Based Optimization algorithm (TLBO), NTLBO presented better results than TLBO and many existing works. The results showed that NTLBO reached 100% accuracy for KDDCUP’99 dataset and 97% for CICIDS dataset

Download Full-text

A New Hybrid Feature Subset Selection Framework Based on Binary Genetic Algorithm and Information Theory

International Journal of Computational Intelligence and Applications ◽

10.1142/s1469026819500202 ◽

2019 ◽

Vol 18 (03) ◽

pp. 1950020 ◽

Cited By ~ 13

Author(s):

Alok Kumar Shukla ◽

Pradeep Singh ◽

Manu Vardhan

Keyword(s):

Genetic Algorithm ◽

Feature Selection ◽

Classification Accuracy ◽

B Cell Lymphoma ◽

Feature Subset Selection ◽

Classification Model ◽

Significant Feature ◽

Support Vector ◽

Feature Subset ◽

Binary Genetic Algorithm

The explosion of the high-dimensional dataset in the scientific repository has been encouraging interdisciplinary research on data mining, pattern recognition and bioinformatics. The fundamental problem of the individual Feature Selection (FS) method is extracting informative features for classification model and to seek for the malignant disease at low computational cost. In addition, existing FS approaches overlook the fact that for a given cardinality, there can be several subsets with similar information. This paper introduces a novel hybrid FS algorithm, called Filter-Wrapper Feature Selection (FWFS) for a classification problem and also addresses the limitations of existing methods. In the proposed model, the front-end filter ranking method as Conditional Mutual Information Maximization (CMIM) selects the high ranked feature subset while the succeeding method as Binary Genetic Algorithm (BGA) accelerates the search in identifying the significant feature subsets. One of the merits of the proposed method is that, unlike an exhaustive method, it speeds up the FS procedure without lancing of classification accuracy on reduced dataset when a learning model is applied to the selected subsets of features. The efficacy of the proposed (FWFS) method is examined by Naive Bayes (NB) classifier which works as a fitness function. The effectiveness of the selected feature subset is evaluated using numerous classifiers on five biological datasets and five UCI datasets of a varied dimensionality and number of instances. The experimental results emphasize that the proposed method provides additional support to the significant reduction of the features and outperforms the existing methods. For microarray data-sets, we found the lowest classification accuracy is 61.24% on SRBCT dataset and highest accuracy is 99.32% on Diffuse large B-cell lymphoma (DLBCL). In UCI datasets, the lowest classification accuracy is 40.04% on the Lymphography using k-nearest neighbor (k-NN) and highest classification accuracy is 99.05% on the ionosphere using support vector machine (SVM).

Download Full-text

Artificial bee colony algorithm for feature selection and improved support vector machine for text classification

Information Discovery and Delivery ◽

10.1108/idd-09-2018-0045 ◽

2019 ◽

Vol 47 (3) ◽

pp. 154-170

Author(s):

Janani Balakumar ◽

S. Vijayarani Mohan

Keyword(s):

Support Vector Machine ◽

Feature Selection ◽

Text Classification ◽

Support Vector ◽

Data Sets ◽

Selection Algorithm ◽

Data Set ◽

Content Type ◽

Benchmark Data ◽

Bee Colony

Purpose Owing to the huge volume of documents available on the internet, text classification becomes a necessary task to handle these documents. To achieve optimal text classification results, feature selection, an important stage, is used to curtail the dimensionality of text documents by choosing suitable features. The main purpose of this research work is to classify the personal computer documents based on their content. Design/methodology/approach This paper proposes a new algorithm for feature selection based on artificial bee colony (ABCFS) to enhance the text classification accuracy. The proposed algorithm (ABCFS) is scrutinized with the real and benchmark data sets, which is contrary to the other existing feature selection approaches such as information gain and χ2 statistic. To justify the efficiency of the proposed algorithm, the support vector machine (SVM) and improved SVM classifier are used in this paper. Findings The experiment was conducted on real and benchmark data sets. The real data set was collected in the form of documents that were stored in the personal computer, and the benchmark data set was collected from Reuters and 20 Newsgroups corpus. The results prove the performance of the proposed feature selection algorithm by enhancing the text document classification accuracy. Originality/value This paper proposes a new ABCFS algorithm for feature selection, evaluates the efficiency of the ABCFS algorithm and improves the support vector machine. In this paper, the ABCFS algorithm is used to select the features from text (unstructured) documents. Although, there is no text feature selection algorithm in the existing work, the ABCFS algorithm is used to select the data (structured) features. The proposed algorithm will classify the documents automatically based on their content.

Download Full-text

Large Margin Feature Selection for Support Vector Machine

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.274.161 ◽

2013 ◽

Vol 274 ◽

pp. 161-164 ◽

Cited By ~ 1

Author(s):

Wei Pan ◽

Pei Jun Ma ◽

Xiao Hong Su

Keyword(s):

Feature Selection ◽

Optimal Solution ◽

Support Vector ◽

Data Sets ◽

Feature Subset ◽

L1 Norm ◽

A Algorithm ◽

Regularization Technique ◽

Feature Weight ◽

Selection For

Feature selection is an preprocessing step in pattern analysis and machine learning. In this paper, we design a algorithm for feature subset. We present L1-norm regularization technique for sparse feature weight. Margin loss are introduced to evaluate features, and we employs gradient descent to search the optimal solution to maximize margin. The proposed technique is tested on UCI data sets. Compared with four margin based loss functions for SVM, the proposed technique is effective and efficient.

Download Full-text

A Hybrid Feature Selection Method for Improve the Accuracy of Medical Classification Process

International Journal of Innovative Technology and Exploring Engineering - Special Issue ◽

10.35940/ijitee.a9624.1111121 ◽

2021 ◽

Vol 11 (1) ◽

pp. 50-55

Author(s):

Maria Mohammad Yousef ◽

Keyword(s):

Machine Learning ◽

Feature Selection ◽

Dimensionality Reduction ◽

Classification Accuracy ◽

Fitness Function ◽

Machine Learning Algorithms ◽

Feature Subset Selection ◽

High Dimensionality ◽

Support Vector ◽

Feature Subset

Generally, medical dataset classification has become one of the biggest problems in data mining research. Every database has a given number of features but it is observed that some of these features can be redundant and can be harmful as well as disrupt the process of classification and this problem is known as a high dimensionality problem. Dimensionality reduction in data preprocessing is critical for increasing the performance of machine learning algorithms. Besides the contribution of feature subset selection in dimensionality reduction gives a significant improvement in classification accuracy. In this paper, we proposed a new hybrid feature selection approach based on (GA assisted by KNN) to deal with issues of high dimensionality in biomedical data classification. The proposed method first applies the combination between GA and KNN for feature selection to find the optimal subset of features where the classification accuracy of the k-Nearest Neighbor (kNN) method is used as the fitness function for GA. After selecting the best-suggested subset of features, Support Vector Machine (SVM) are used as the classifiers. The proposed method experiments on five medical datasets of the UCI Machine Learning Repository. It is noted that the suggested technique performs admirably on these databases, achieving higher classification accuracy while using fewer features.

Download Full-text

Feature selection in classification using self-adaptive owl search optimization algorithm with elitism and mutation strategies

Journal of Intelligent & Fuzzy Systems ◽

10.3233/jifs-200258 ◽

2021 ◽

Vol 40 (1) ◽

pp. 535-550

Author(s):

Ashis Kumar Mandal ◽

Rikta Sen ◽

Basabi Chakraborty

Keyword(s):

Feature Selection ◽

Optimization Algorithm ◽

Classification Accuracy ◽

Heuristic Algorithms ◽

Support Vector ◽

Svm Classifier ◽

Binary Particle Swarm Optimization ◽

Feature Subset ◽

Search Optimization ◽

Self Adaptive

The fundamental aim of feature selection is to reduce the dimensionality of data by removing irrelevant and redundant features. As finding out the best subset of features from all possible subsets is computationally expensive, especially for high dimensional data sets, meta-heuristic algorithms are often used as a promising method for addressing the task. In this paper, a variant of recent meta-heuristic approach Owl Search Optimization algorithm (OSA) has been proposed for solving the feature selection problem within a wrapper-based framework. Several strategies are incorporated with an aim to strengthen BOSA (binary version of OSA) in searching the global best solution. The meta-parameter of BOSA is initialized dynamically and then adjusted using a self-adaptive mechanism during the search process. Besides, elitism and mutation operations are combined with BOSA to control the exploitation and exploration better. This improved BOSA is named in this paper as Modified Binary Owl Search Algorithm (MBOSA). Decision Tree (DT) classifier is used for wrapper based fitness function, and the final classification performance of the selected feature subset is evaluated by Support Vector Machine (SVM) classifier. Simulation experiments are conducted on twenty well-known benchmark datasets from UCI for the evaluation of the proposed algorithm, and the results are reported based on classification accuracy, the number of selected features, and execution time. In addition, BOSA along with three common meta-heuristic algorithms Binary Bat Algorithm (BBA), Binary Particle Swarm Optimization (BPSO), and Binary Genetic Algorithm (BGA) are used for comparison. Simulation results show that the proposed approach outperforms similar methods by reducing the number of features significantly while maintaining a comparable level of classification accuracy.

Download Full-text

Revealing False Positive Features in Epileptic EEG Identification

International Journal of Neural Systems ◽

10.1142/s0129065720500173 ◽

2020 ◽

Vol 30 (11) ◽

pp. 2050017 ◽

Cited By ~ 1

Author(s):

Jian Lian ◽

Yunfeng Shi ◽

Yan Zhang ◽

Weikuan Jia ◽

Xiaojun Fan ◽

...

Keyword(s):

Feature Selection ◽

Nearest Neighbor ◽

State Of The Art ◽

The State ◽

Support Vector ◽

Feature Subset ◽

Selection Algorithm ◽

Feature Selection Algorithm ◽

Eeg Signals ◽

Eeg Classification

Feature selection plays a vital role in the detection and discrimination of epileptic seizures in electroencephalogram (EEG) signals. The state-of-the-art EEG classification techniques commonly entail the extraction of the multiple features that would be fed into classifiers. For some techniques, the feature selection strategies have been used to reduce the dimensionality of the entire feature space. However, most of these approaches focus on the performance of classifiers while neglecting the association between the feature and the EEG activity itself. To enhance the inner relationship between the feature subset and the epileptic EEG task with a promising classification accuracy, we propose a machine learning-based pipeline using a novel feature selection algorithm built upon a knockoff filter. First, a number of temporal, spectral, and spatial features are extracted from the raw EEG signals. Second, the proposed feature selection algorithm is exploited to obtain the optimal subgroup of features. Afterwards, three classifiers including [Formula: see text]-nearest neighbor (KNN), random forest (RF) and support vector machine (SVM) are used. The experimental results on the Bonn dataset demonstrate that the proposed approach outperforms the state-of-the-art techniques, with accuracy as high as 99.93% for normal and interictal EEG discrimination and 98.95% for interictal and ictal EEG classification. Meanwhile, it has achieved satisfactory sensitivity (95.67% in average), specificity (98.83% in average), and accuracy (98.89% in average) over the Freiburg dataset.

Download Full-text

Feature Selection Using Harmony Search for Script Identification from Handwritten Document Images

Journal of Intelligent Systems ◽

10.1515/jisys-2016-0070 ◽

2018 ◽

Vol 27 (3) ◽

pp. 465-488 ◽

Cited By ~ 5

Author(s):

Pawan Kumar Singh ◽

Supratim Das ◽

Ram Sarkar ◽

Mita Nasipuri

Keyword(s):

Feature Selection ◽

Harmony Search ◽

Distribution Networks ◽

Feature Subset Selection ◽

Support Vector ◽

Feature Subset ◽

Document Images ◽

Global Features ◽

Script Identification ◽

Handwritten Document

Abstract The feature selection process can be considered a problem of global combinatorial optimization in machine learning, which reduces the irrelevant, noisy, and non-contributing features, resulting in acceptable classification accuracy. Harmony search algorithm (HSA) is an evolutionary algorithm that is applied to various optimization problems such as scheduling, text summarization, water distribution networks, vehicle routing, etc. This paper presents a hybrid approach based on support vector machine and HSA for wrapper feature subset selection. This approach is used to select an optimized set of features from an initial set of features obtained by applying Modified log-Gabor filters on prepartitioned rectangular blocks of handwritten document images written in either of 12 official Indic scripts. The assessment justifies the need of feature selection for handwritten script identification where local and global features are computed without knowing the exact importance of features. The proposed approach is also compared with four well-known evolutionary algorithms, namely genetic algorithm, particle swarm optimization, tabu search, ant colony optimization, and two statistical feature dimensionality reduction techniques, namely greedy attribute search and principal component analysis. The acquired results show that the optimal set of features selected using HSA gives better accuracy in handwritten script recognition.

Download Full-text

Mahalanobis-ANOVA criterion for optimum feature subset selection in multi-class planetary gear fault diagnosis

Journal of Vibration and Control ◽

10.1177/10775463211029153 ◽

2021 ◽

pp. 107754632110291

Author(s):

Setti Suresh ◽

VPS Naidu

Keyword(s):

Support Vector Machine ◽

Feature Selection ◽

Fault Diagnosis ◽

Learning Algorithm ◽

Selection Criterion ◽

Feature Subset Selection ◽

Fault Classification ◽

Support Vector ◽

Feature Subset ◽

Gear Fault

The empirical analysis of a typical gear fault diagnosis of five different classes has been studied in this article. The analysis was used to develop novel feature selection criteria that provide an optimum feature subset over feature ranking genetic algorithms for improving the planetary gear fault classification accuracy. We have considered traditional approach in the fault diagnosis, where the raw vibration signal was divided into fixed-length epochs, and statistical time-domain features have been extracted from the segmented signal to represent the data in a compact discriminative form. Scale-invariant Mahalanobis distance–based feature selection using ANOVA statistic test was used as a feature selection criterion to find out the optimum feature subset. The Support Vector Machine Multi-Class machine learning algorithm was used as a classification technique to diagnose the gear faults. It has been observed that the highest gear fault classification accuracy of 99.89% (load case) was achieved by using the proposed Mahalanobis-ANOVA Criterion for optimum feature subset selection followed by Support Vector Machine Multi-Class algorithm. It is also noted that the developed feature selection criterion is a data-driven model which will contemplate all the nonlinearity in a signal. The fault diagnosis consistency of the proposed Support Vector Machine Multi-Class learning algorithm was ensured through 100 Monte Carlo runs, and the diagnostic ability of the classifier has been represented using confusion matrix and receiver operating characteristics.

Download Full-text