Addressing Low Dimensionality Feature Subset Selection: ReliefF(-k) or Extended Correlation-Based Feature Selection(eCFS)?

Angle Modulated Artificial Bee Colony Algorithms for Feature Selection

Applied Computational Intelligence and Soft Computing ◽

10.1155/2016/9569161 ◽

2016 ◽

Vol 2016 ◽

pp. 1-6 ◽

Cited By ~ 7

Author(s):

Gürcan Yavuz ◽

Doğan Aydin

Keyword(s):

Feature Selection ◽

Artificial Bee Colony ◽

Continuous Optimization ◽

Subset Selection ◽

Machine Intelligence ◽

Feature Subset Selection ◽

High Dimensional ◽

Feature Subset ◽

Bee Colony ◽

Angle Modulation

Optimal feature subset selection is an important and a difficult task for pattern classification, data mining, and machine intelligence applications. The objective of the feature subset selection is to eliminate the irrelevant and noisy feature in order to select optimum feature subsets and increase accuracy. The large number of features in a dataset increases the computational complexity thus leading to performance degradation. In this paper, to overcome this problem, angle modulation technique is used to reduce feature subset selection problem to four-dimensional continuous optimization problem instead of presenting the problem as a high-dimensional bit vector. To present the effectiveness of the problem presentation with angle modulation and to determine the efficiency of the proposed method, six variants of Artificial Bee Colony (ABC) algorithms employ angle modulation for feature selection. Experimental results on six high-dimensional datasets show that Angle Modulated ABC algorithms improved the classification accuracy with fewer feature subsets.

Download Full-text

Intelligent Feature Subset Selection with Machine Learning based Risk Management for DAS Prediction

10.54216/jcim.080101 ◽

2021 ◽

pp. 08-16

Author(s):

Mohamed Abdel Abdel-Basset ◽

◽

Mohamed Elhoseny

Keyword(s):

Machine Learning ◽

Risk Management ◽

Feature Selection ◽

Subset Selection ◽

Classification Problem ◽

Feature Subset Selection ◽

Feature Subset ◽

Time Machine ◽

Primary Level ◽

Stage Process

In the current epidemic situations, people are facing several mental disorders related to Depression, Anxiety, and Stress (DAS). Numerous scales are developed for computing the levels for DAS, and DAS-21 is one among them. At the same time, machine learning (ML) models are applied widely to resolve the classification problem efficiently, and feature selection (FS) approaches can be designed to improve the classifier results. In this aspect, this paper develops an intelligent feature selection with ML-based risk management (IFSML-RM) for DAS prediction. The IFSML-RM technique follows a two-stage process: quantum elephant herd optimization-based FS (QEHO-FS) and decision tree (DT) based classification. The QEHO algorithm utilizes the input data to select a valuable subset of features at the primary level. Then, the chosen features are fed into the DT classifier to determine the existence or non-existence of DAS. A detailed experimentation process is carried out on the benchmark dataset, and the experimental results showcased the betterment of the IFSML-RM technique in terms of different performance measures.

Download Full-text

Extreme Learning Machine-based Differential Evolution Feature Selection Approach for Feature Subset Selection from Digital Images

International Journal of Applied Research on Information Technology and Computing ◽

10.5958/0975-8089.2016.00013.0 ◽

2016 ◽

Vol 7 (2) ◽

pp. 119

Author(s):

P.T.T. Bharathi ◽

P. Subashini ◽

G.K. Asha

Keyword(s):

Feature Selection ◽

Differential Evolution ◽

Extreme Learning Machine ◽

Subset Selection ◽

Feature Subset Selection ◽

Feature Subset ◽

Selection Approach ◽

Evolution Feature ◽

Learning Machine ◽

Feature Selection Approach

Download Full-text

Optimizing Storage Space for Higher-Dimensional Data Using Feature Subset Selection Approach

International Journal of Emerging Research in Management and Technology ◽

10.23956/ijermt.v6i6.241 ◽

2018 ◽

Vol 6 (6) ◽

pp. 30

Author(s):

Donia Augustine

Keyword(s):

Feature Selection ◽

Minimum Spanning Tree ◽

Subset Selection ◽

Feature Subset Selection ◽

Feature Subset ◽

Clustering Methods ◽

Target Class ◽

Graph Theoretic ◽

Higher Dimensional ◽

Selection Research

As applications producing data of higher dimensions has increased tremendously, clustering of data under reduced memory became a necessity. Feature selection is a typical approach to cluster higher dimensional data. It involves identifying a subset of most relevant features from the entire set of features. Our approach suggests a method to efficiently cluster higher dimensional data under reduced memory. An N-dimensional feature selection algorithm, NDFS is used for identifying the subset of relevant features. The concept of feature selection helps in removing the irrelevant and redundant features from each cluster. In the initial phase of NDFS algorithm features are divided into clusters using graph-theoretic clustering methods. The final phase of the algorithm generates the subset of relevant features that are closely related to the target class. Features in different clusters are relatively independent. In particular, the minimum spanning tree is constructed to efficiently manipulate the subset of features. Traditionally, feature subset selection research has focused on searching for relevant features. The clustering based strategy of NDFS have a high probability of producing a subset of useful and independent features.

Download Full-text

A Feature Subset Selection Algorithm Automatic Recommendation Method

Journal of Artificial Intelligence Research ◽

10.1613/jair.3831 ◽

2013 ◽

Vol 47 ◽

pp. 1-34 ◽

Cited By ~ 34

Author(s):

G. Wang ◽

Q. Song ◽

H. Sun ◽

X. Zhang ◽

B. Xu ◽

...

Keyword(s):

Feature Selection ◽

Subset Selection ◽

Feature Subset Selection ◽

Data Sets ◽

Feature Subset ◽

Similar Data ◽

K Nearest Neighbor ◽

Real World Data ◽

Feature Selection Problem ◽

Data Set

Many feature subset selection (FSS) algorithms have been proposed, but not all of them are appropriate for a given feature selection problem. At the same time, so far there is rarely a good way to choose appropriate FSS algorithms for the problem at hand. Thus, FSS algorithm automatic recommendation is very important and practically useful. In this paper, a meta learning based FSS algorithm automatic recommendation method is presented. The proposed method first identifies the data sets that are most similar to the one at hand by the k-nearest neighbor classification algorithm, and the distances among these data sets are calculated based on the commonly-used data set characteristics. Then, it ranks all the candidate FSS algorithms according to their performance on these similar data sets, and chooses the algorithms with best performance as the appropriate ones. The performance of the candidate FSS algorithms is evaluated by a multi-criteria metric that takes into account not only the classification accuracy over the selected features, but also the runtime of feature selection and the number of selected features. The proposed recommendation method is extensively tested on 115 real world data sets with 22 well-known and frequently-used different FSS algorithms for five representative classifiers. The results show the effectiveness of our proposed FSS algorithm recommendation method.

Download Full-text

Ensemble Based Classification of Sentiments Using Forest Optimization Algorithm

Data ◽

10.3390/data4020076 ◽

2019 ◽

Vol 4 (2) ◽

pp. 76 ◽

Cited By ~ 2

Author(s):

Mehreen Naz ◽

Kashif Zafar ◽

Ayesha Khan

Keyword(s):

Feature Selection ◽

Optimization Algorithm ◽

Subset Selection ◽

Feature Subset Selection ◽

Selection Problem ◽

Support Vector ◽

Feature Subset ◽

Hybrid Technique ◽

Computational Performance

Feature subset selection is a process to choose a set of relevant features from a high dimensionality dataset to improve the performance of classifiers. The meaningful words extracted from data forms a set of features for sentiment analysis. Many evolutionary algorithms, like the Genetic Algorithm (GA) and Particle Swarm Optimization (PSO), have been applied to feature subset selection problem and computational performance can still be improved. This research presents a solution to feature subset selection problem for classification of sentiments using ensemble-based classifiers. It consists of a hybrid technique of minimum redundancy and maximum relevance (mRMR) and Forest Optimization Algorithm (FOA)-based feature selection. Ensemble-based classification is implemented to optimize the results of individual classifiers. The Forest Optimization Algorithm as a feature selection technique has been applied to various classification datasets from the UCI machine learning repository. The classifiers used for ensemble methods for UCI repository datasets are the k-Nearest Neighbor (k-NN) and Naïve Bayes (NB). For the classification of sentiments, 15–20% improvement has been recorded. The dataset used for classification of sentiments is Blitzer’s dataset consisting of reviews of electronic products. The results are further improved by ensemble of k-NN, NB, and Support Vector Machine (SVM) with an accuracy of 95% for the classification of sentiment tasks.

Download Full-text

An Efficient Feature Subset Selection Algorithm for Classification of Multidimensional Dataset

The Scientific World JOURNAL ◽

10.1155/2015/821798 ◽

2015 ◽

Vol 2015 ◽

pp. 1-9 ◽

Cited By ~ 5

Author(s):

Senthilkumar Devaraj ◽

S. Paulraj

Keyword(s):

Feature Selection ◽

Subset Selection ◽

Feature Subset Selection ◽

Complex Nature ◽

Feature Subset ◽

Selection Algorithm ◽

Feature Selection Algorithm ◽

Multidimensional Datasets ◽

Study Results

Multidimensional medical data classification has recently received increased attention by researchers working on machine learning and data mining. In multidimensional dataset (MDD) each instance is associated with multiple class values. Due to its complex nature, feature selection and classifier built from the MDD are typically more expensive or time-consuming. Therefore, we need a robust feature selection technique for selecting the optimum single subset of the features of the MDD for further analysis or to design a classifier. In this paper, an efficient feature selection algorithm is proposed for the classification of MDD. The proposed multidimensional feature subset selection (MFSS) algorithm yields a unique feature subset for further analysis or to build a classifier and there is a computational advantage on MDD compared with the existing feature selection algorithms. The proposed work is applied to benchmark multidimensional datasets. The number of features was reduced to 3% minimum and 30% maximum by using the proposed MFSS. In conclusion, the study results show that MFSS is an efficient feature selection algorithm without affecting the classification accuracy even for the reduced number of features. Also the proposed MFSS algorithm is suitable for both problem transformation and algorithm adaptation and it has great potentials in those applications generating multidimensional datasets.

Download Full-text

A Briefest Feature Subset Selection Algorithm Based on Preference Attribute

Advanced Materials Research ◽

10.4028/www.scientific.net/amr.774-776.1816 ◽

2013 ◽

Vol 774-776 ◽

pp. 1816-1822

Author(s):

Kai Yang ◽

Yong Long Jin ◽

Zhi Jun He

Keyword(s):

Feature Selection ◽

Concept Lattice ◽

Subset Selection ◽

Feature Subset Selection ◽

Formal Concept ◽

Feature Subset ◽

Selection Algorithm ◽

Subjective Experiences ◽

The Given ◽

Concept Pairs

Concept lattice is the core data structure of formal concept analysis and represents the order relationship between the concepts iconically. Feature selection has been the focus of research in machine learning.And feature selection has been shown very effective in removing irrelevant and redundant features,also increasing efficiency in learning process and obtaining more intelligible learned results.This paper proposes a new briefest feature subset selection algorithm based on preference attribute on the basis of study of concept lattice theory. User can put forward a preference attribute according to their subjective experiences, all the briefest feature subsets containing the given attribute can be discovered by the algorithm. It firstly find some special concept pairs and calculate their waned-value hypergraph, then obtain the minimal transversal of the hypergraph as a result. A practical example proves the method is cogent and effective.

Download Full-text

Aggregating Data Sampling with Feature Subset Selection to Address Skewed Software Defect Data

International Journal of Software Engineering and Knowledge Engineering ◽

10.1142/s0218194015400318 ◽

2015 ◽

Vol 25 (09n10) ◽

pp. 1531-1550 ◽

Cited By ~ 2

Author(s):

Kehan Gao ◽

Taghi M. Khoshgoftaar ◽

Amri Napolitano

Keyword(s):

Feature Selection ◽

Software Quality ◽

Sampling Methods ◽

Subset Selection ◽

Feature Subset Selection ◽

Defect Prediction ◽

Feature Subset ◽

Data Sampling ◽

Selection Technique ◽

Random Undersampling

Defect prediction is an important process activity frequently used for improving the quality and reliability of software products. Defect prediction results provide a list of fault-prone modules which are necessary in helping project managers better utilize valuable project resources. In the software quality modeling process, high dimensionality and class imbalance are the two potential problems that may exist in data repositories. In this study, we investigate three data preprocessing approaches, in which feature selection is combined with data sampling, to overcome these problems in the context of software quality estimation. These three approaches are: Approach 1 — sampling performed prior to feature selection, but retaining the unsampled data instances; Approach 2 — sampling performed prior to feature selection, retaining the sampled data instances; and Approach 3 — sampling performed after feature selection. A comparative investigation is presented for evaluating the three approaches. In the experiments, we employed three sampling methods (random undersampling, random oversampling, and synthetic minority oversampling), each combined with a filter-based feature subset selection technique called correlation-based feature selection. We built the defect prediction models using five common classification algorithms. The case study was based on software metrics and defect data collected from multiple releases of a real-world software system. The results demonstrated that the type of sampling methods used in data preprocessing significantly affected the performance of the combination approaches. It was found that when the random undersampling technique was used, Approach 1 performed better than the other two approaches. However, when the feature selection technique was used in conjunction with an oversampling method (random oversampling or synthetic minority oversampling), we strongly recommended Approach 3.

Download Full-text

SVM and KNN Based SGO Feature Selection Algorithm for Breast Cancer Diagnosis

International Journal of Recent Technology and Engineering - 2 ◽

10.35940/ijrte.d4428.038620 ◽

2020 ◽

Vol 8 (2S7) ◽

pp. 2237-2240

Keyword(s):

Breast Cancer ◽

Machine Learning ◽

Feature Selection ◽

Learning Algorithms ◽

Subset Selection ◽

Machine Learning Algorithms ◽

Feature Subset Selection ◽

Feature Subset ◽

Selection Algorithm ◽

Feature Selection Algorithm

In diagnosis and prediction systems, algorithms working on datasets with a high number of dimensions tend to take more time than those with fewer dimensions. Feature subset selection algorithms enhance the efficiency of Machine Learning algorithms in prediction problems by selecting a subset of the total features and thus pruning redundancy and noise. In this article, such a feature subset selection method is proposed and implemented to diagnose breast cancer using Support Vector Machine (SVM) and K-Nearest Neighbor (KNN) algorithms. This feature selection algorithm is based on Social Group Optimization (SGO) an evolutionary algorithm. Higher accuracy in diagnosing breast cancer is achieved using our proposed model when compared to other feature selection-based Machine Learning algorithms

Download Full-text