scholarly journals Addressing Low Dimensionality Feature Subset Selection: ReliefF(-k) or Extended Correlation-Based Feature Selection(eCFS)?

Author(s):  
Antonio J. Tallón-Ballesteros ◽  
Luís Cavique ◽  
Simon Fong
2016 ◽  
Vol 2016 ◽  
pp. 1-6 ◽  
Author(s):  
Gürcan Yavuz ◽  
Doğan Aydin

Optimal feature subset selection is an important and a difficult task for pattern classification, data mining, and machine intelligence applications. The objective of the feature subset selection is to eliminate the irrelevant and noisy feature in order to select optimum feature subsets and increase accuracy. The large number of features in a dataset increases the computational complexity thus leading to performance degradation. In this paper, to overcome this problem, angle modulation technique is used to reduce feature subset selection problem to four-dimensional continuous optimization problem instead of presenting the problem as a high-dimensional bit vector. To present the effectiveness of the problem presentation with angle modulation and to determine the efficiency of the proposed method, six variants of Artificial Bee Colony (ABC) algorithms employ angle modulation for feature selection. Experimental results on six high-dimensional datasets show that Angle Modulated ABC algorithms improved the classification accuracy with fewer feature subsets.


2021 ◽  
pp. 08-16
Author(s):  
Mohamed Abdel Abdel-Basset ◽  
◽  
◽  
Mohamed Elhoseny

In the current epidemic situations, people are facing several mental disorders related to Depression, Anxiety, and Stress (DAS). Numerous scales are developed for computing the levels for DAS, and DAS-21 is one among them. At the same time, machine learning (ML) models are applied widely to resolve the classification problem efficiently, and feature selection (FS) approaches can be designed to improve the classifier results. In this aspect, this paper develops an intelligent feature selection with ML-based risk management (IFSML-RM) for DAS prediction. The IFSML-RM technique follows a two-stage process: quantum elephant herd optimization-based FS (QEHO-FS) and decision tree (DT) based classification. The QEHO algorithm utilizes the input data to select a valuable subset of features at the primary level. Then, the chosen features are fed into the DT classifier to determine the existence or non-existence of DAS. A detailed experimentation process is carried out on the benchmark dataset, and the experimental results showcased the betterment of the IFSML-RM technique in terms of different performance measures.


Author(s):  
Donia Augustine

As applications producing data of higher dimensions has increased tremendously, clustering of data under reduced memory became a necessity. Feature selection is a typical approach to cluster higher dimensional data. It involves identifying a subset of most relevant features from the entire set of features. Our approach suggests a method to efficiently cluster higher dimensional data under reduced memory. An N-dimensional feature selection algorithm, NDFS is used for identifying the subset of relevant features. The concept of feature selection helps in removing the irrelevant and redundant features from each cluster. In the initial phase of NDFS algorithm features are divided into clusters using graph-theoretic clustering methods. The final phase of the algorithm generates the subset of relevant features that are closely related to the target class. Features in different clusters are relatively independent. In particular, the minimum spanning tree is constructed to efficiently manipulate the subset of features. Traditionally, feature subset selection research has focused on searching for relevant features. The clustering based strategy of NDFS have a high probability of producing a subset of useful and independent features.


2013 ◽  
Vol 47 ◽  
pp. 1-34 ◽  
Author(s):  
G. Wang ◽  
Q. Song ◽  
H. Sun ◽  
X. Zhang ◽  
B. Xu ◽  
...  

Many feature subset selection (FSS) algorithms have been proposed, but not all of them are appropriate for a given feature selection problem. At the same time, so far there is rarely a good way to choose appropriate FSS algorithms for the problem at hand. Thus, FSS algorithm automatic recommendation is very important and practically useful. In this paper, a meta learning based FSS algorithm automatic recommendation method is presented. The proposed method first identifies the data sets that are most similar to the one at hand by the k-nearest neighbor classification algorithm, and the distances among these data sets are calculated based on the commonly-used data set characteristics. Then, it ranks all the candidate FSS algorithms according to their performance on these similar data sets, and chooses the algorithms with best performance as the appropriate ones. The performance of the candidate FSS algorithms is evaluated by a multi-criteria metric that takes into account not only the classification accuracy over the selected features, but also the runtime of feature selection and the number of selected features. The proposed recommendation method is extensively tested on 115 real world data sets with 22 well-known and frequently-used different FSS algorithms for five representative classifiers. The results show the effectiveness of our proposed FSS algorithm recommendation method.


Data ◽  
2019 ◽  
Vol 4 (2) ◽  
pp. 76 ◽  
Author(s):  
Mehreen Naz ◽  
Kashif Zafar ◽  
Ayesha Khan

Feature subset selection is a process to choose a set of relevant features from a high dimensionality dataset to improve the performance of classifiers. The meaningful words extracted from data forms a set of features for sentiment analysis. Many evolutionary algorithms, like the Genetic Algorithm (GA) and Particle Swarm Optimization (PSO), have been applied to feature subset selection problem and computational performance can still be improved. This research presents a solution to feature subset selection problem for classification of sentiments using ensemble-based classifiers. It consists of a hybrid technique of minimum redundancy and maximum relevance (mRMR) and Forest Optimization Algorithm (FOA)-based feature selection. Ensemble-based classification is implemented to optimize the results of individual classifiers. The Forest Optimization Algorithm as a feature selection technique has been applied to various classification datasets from the UCI machine learning repository. The classifiers used for ensemble methods for UCI repository datasets are the k-Nearest Neighbor (k-NN) and Naïve Bayes (NB). For the classification of sentiments, 15–20% improvement has been recorded. The dataset used for classification of sentiments is Blitzer’s dataset consisting of reviews of electronic products. The results are further improved by ensemble of k-NN, NB, and Support Vector Machine (SVM) with an accuracy of 95% for the classification of sentiment tasks.


2015 ◽  
Vol 2015 ◽  
pp. 1-9 ◽  
Author(s):  
Senthilkumar Devaraj ◽  
S. Paulraj

Multidimensional medical data classification has recently received increased attention by researchers working on machine learning and data mining. In multidimensional dataset (MDD) each instance is associated with multiple class values. Due to its complex nature, feature selection and classifier built from the MDD are typically more expensive or time-consuming. Therefore, we need a robust feature selection technique for selecting the optimum single subset of the features of the MDD for further analysis or to design a classifier. In this paper, an efficient feature selection algorithm is proposed for the classification of MDD. The proposed multidimensional feature subset selection (MFSS) algorithm yields a unique feature subset for further analysis or to build a classifier and there is a computational advantage on MDD compared with the existing feature selection algorithms. The proposed work is applied to benchmark multidimensional datasets. The number of features was reduced to 3% minimum and 30% maximum by using the proposed MFSS. In conclusion, the study results show that MFSS is an efficient feature selection algorithm without affecting the classification accuracy even for the reduced number of features. Also the proposed MFSS algorithm is suitable for both problem transformation and algorithm adaptation and it has great potentials in those applications generating multidimensional datasets.


2013 ◽  
Vol 774-776 ◽  
pp. 1816-1822
Author(s):  
Kai Yang ◽  
Yong Long Jin ◽  
Zhi Jun He

Concept lattice is the core data structure of formal concept analysis and represents the order relationship between the concepts iconically. Feature selection has been the focus of research in machine learning.And feature selection has been shown very effective in removing irrelevant and redundant features,also increasing efficiency in learning process and obtaining more intelligible learned results.This paper proposes a new briefest feature subset selection algorithm based on preference attribute on the basis of study of concept lattice theory. User can put forward a preference attribute according to their subjective experiences, all the briefest feature subsets containing the given attribute can be discovered by the algorithm. It firstly find some special concept pairs and calculate their waned-value hypergraph, then obtain the minimal transversal of the hypergraph as a result. A practical example proves the method is cogent and effective.


2015 ◽  
Vol 25 (09n10) ◽  
pp. 1531-1550 ◽  
Author(s):  
Kehan Gao ◽  
Taghi M. Khoshgoftaar ◽  
Amri Napolitano

Defect prediction is an important process activity frequently used for improving the quality and reliability of software products. Defect prediction results provide a list of fault-prone modules which are necessary in helping project managers better utilize valuable project resources. In the software quality modeling process, high dimensionality and class imbalance are the two potential problems that may exist in data repositories. In this study, we investigate three data preprocessing approaches, in which feature selection is combined with data sampling, to overcome these problems in the context of software quality estimation. These three approaches are: Approach 1 — sampling performed prior to feature selection, but retaining the unsampled data instances; Approach 2 — sampling performed prior to feature selection, retaining the sampled data instances; and Approach 3 — sampling performed after feature selection. A comparative investigation is presented for evaluating the three approaches. In the experiments, we employed three sampling methods (random undersampling, random oversampling, and synthetic minority oversampling), each combined with a filter-based feature subset selection technique called correlation-based feature selection. We built the defect prediction models using five common classification algorithms. The case study was based on software metrics and defect data collected from multiple releases of a real-world software system. The results demonstrated that the type of sampling methods used in data preprocessing significantly affected the performance of the combination approaches. It was found that when the random undersampling technique was used, Approach 1 performed better than the other two approaches. However, when the feature selection technique was used in conjunction with an oversampling method (random oversampling or synthetic minority oversampling), we strongly recommended Approach 3.


2020 ◽  
Vol 8 (2S7) ◽  
pp. 2237-2240

In diagnosis and prediction systems, algorithms working on datasets with a high number of dimensions tend to take more time than those with fewer dimensions. Feature subset selection algorithms enhance the efficiency of Machine Learning algorithms in prediction problems by selecting a subset of the total features and thus pruning redundancy and noise. In this article, such a feature subset selection method is proposed and implemented to diagnose breast cancer using Support Vector Machine (SVM) and K-Nearest Neighbor (KNN) algorithms. This feature selection algorithm is based on Social Group Optimization (SGO) an evolutionary algorithm. Higher accuracy in diagnosing breast cancer is achieved using our proposed model when compared to other feature selection-based Machine Learning algorithms


Sign in / Sign up

Export Citation Format

Share Document