A Novel Method of Constrained Feature Selection by the Measurement of Pairwise Constraints Uncertainty

Abstract In recent years, with the development of science and technology, there were considerable advances in datasets in various sciences, and many features are also shown for these datasets nowadays. With a high-dimensional dataset, many features are generally redundant and/or irrelevant for a provided learning task, which has adverse effects with regard to computational cost and/or performance. The goal of feature selection over partially labeled data (semi-supervised feature selection) is to choose a subset of available features with the lowest redundancy with each other and the highest relevancy to the target class, which is the same objective as the feature selection over entirely labeled data. By appropriate reduction of the dimensions, in addition to time-cost savings, performance increases as well. In this paper, side information such as pairwise constraint is used to rank and reduce the dimensions. In the proposed method, the authors deal with checking the quality (strength or uncertainty) of the pairwise constraint. Usually, the quality of the pair of constraints on the dimension reduction is not calculated. In the first step, the strength matrix is created through a similarity matrix and uncertainty region. And then, by using the strength and similarity matrices, a new constraint feature selection ranking is proposed. The performance of the presented method was compared to the performance of the state-of-the-art, and well-known semi-supervised feature selection approaches on eight datasets. The findings indicate that the proposed approach improves previous related approaches with respect to the accuracy of constrained clustering. In particular, the numerical results showed that the presented approach improved the classification accuracy by about 3% and reduced the number of selected features by 1%. Consequently, it can be said that the proposed method has reduced the computational complexity of the machine learning algorithm despite increasing the classification accuracy.

Download Full-text

A Novel Method of Constrained Feature Selection by the Measurement of Pairwise Constraints Uncertainty

10.21203/rs.3.rs-31751/v2 ◽

2020 ◽

Author(s):

Kamal Berahmand ◽

Mehrdad Rostami ◽

Saman Forouzandeh

Keyword(s):

Machine Learning ◽

Feature Selection ◽

Rapid Growth ◽

Classification Accuracy ◽

High Speed ◽

Large Scale ◽

Learning Algorithm ◽

Target Class ◽

Partially Labeled Data ◽

Novel Method

Abstract In the past decades, the rapid growth of computer and database technologies has led to the rapid growth of large-scale datasets. On the other hand, data mining applications with high dimensional datasets that require high speed and accuracy are rapidly increasing. Semi-supervised learning is a class of machine learning in which unlabeled data and labeled data are used simultaneously to improve feature selection. The goal of feature selection over partially labeled data (semi-supervised feature selection) is to choose a subset of available features with the lowest redundancy with each other and the highest relevancy to the target class, which is the same objective as the feature selection over entirely labeled data. This method actually used the classification to reduce ambiguity in the range of values. First, the similarity values of each pair are collected, and then these values are divided into intervals, and the average of each interval is determined. In the next step, for each interval, the number of pairs in this range is counted. Finally, by using the strength and similarity matrices, a new constraint feature selection ranking is proposed. The performance of the presented method was compared to the performance of the state-of-the-art, and well-known semi-supervised feature selection approaches on eight datasets. The results indicate that the proposed approach improves previous related approaches with respect to the accuracy of the constrained score. In particular, the numerical results showed that the presented approach improved the classification accuracy by about 3% and reduced the number of selected features by 1%. Consequently, it can be said that the proposed method has reduced the computational complexity of the machine learning algorithm despite increasing the classification accuracy.

Download Full-text

A novel method of constrained feature selection by the measurement of pairwise constraints uncertainty

Journal Of Big Data ◽

10.1186/s40537-020-00352-3 ◽

2020 ◽

Vol 7 (1) ◽

Author(s):

Mehrdad Rostami ◽

Kamal Berahmand ◽

Saman Forouzandeh

Keyword(s):

Machine Learning ◽

Feature Selection ◽

Rapid Growth ◽

Classification Accuracy ◽

High Speed ◽

Large Scale ◽

Learning Algorithm ◽

Target Class ◽

Partially Labeled Data ◽

Novel Method

Download Full-text

A Feature Selection Approach for Network Intrusion Classification

International Journal of Computer Vision and Image Processing ◽

10.4018/ijcvip.2013100104 ◽

2013 ◽

Vol 3 (4) ◽

pp. 51-59 ◽

Cited By ~ 1

Author(s):

Heba F. Eid ◽

Mostafa A. Salama ◽

Aboul Ella Hassanien

Keyword(s):

Feature Selection ◽

Classification Accuracy ◽

Learning Algorithm ◽

Local Maximum ◽

Selection Methods ◽

Hybrid Filter ◽

Data Set ◽

Network Intrusion ◽

Selection Approach ◽

Feature Selection Approach

Feature selection is a preprocessing step to machine learning, leads to increase the classification accuracy and reduce its complexity. Feature selection methods are classified into two main categories: filter and wrapper. Filter methods evaluate features without involving any learning algorithm, while wrapper methods depend on a learning algorithm for feature evaluation. Variety hybrid Filter and wrapper methods have been proposed in the literature. However, hybrid filter and wrapper approaches suffer from the problem of determining the cut-off point of the ranked features. This leads to decrease the classification accuracy by eliminating important features. In this paper the authors proposed a Hybrid Bi-Layer behavioral-based feature selection approach, which combines filter and wrapper feature selection methods. The proposed approach solves the cut-off point problem for the ranked features. It consists of two layers, at the first layer Information gain is used to rank the features and select a new set of features depending on a global maxima classification accuracy. Then, at the second layer a new subset of features is selected from within the first layer redacted data set by searching for a group of local maximum classification accuracy. To evaluate the proposed approach it is applied on NSL-KDD dataset, where the number of features is reduced from 41 to 34 features at the first layer. Then reduced from 34 to 20 features at the second layer, which leads to improve the classification accuracy to 99.2%.

Download Full-text

An Empirical Evaluation of Feature Selection Methods

Improving Knowledge Discovery through the Integration of Data Mining Techniques - Advances in Data Mining and Database Management ◽

10.4018/978-1-4666-8513-0.ch012 ◽

2015 ◽

pp. 233-258 ◽

Cited By ~ 1

Author(s):

Mohsin Iqbal ◽

Saif Ur Rehman ◽

Saira Gillani ◽

Sohail Asghar

Keyword(s):

Machine Learning ◽

Feature Selection ◽

Classification Accuracy ◽

Information Gain ◽

Learning Algorithm ◽

Empirical Evaluation ◽

Machine Learning Algorithms ◽

Selection Methods ◽

The One ◽

Processing And Storage

The key objective of the chapter would be to study the classification accuracy, using feature selection with machine learning algorithms. The dimensionality of the data is reduced by implementing Feature selection and accuracy of the learning algorithm improved. We test how an integrated feature selection could affect the accuracy of three classifiers by performing feature selection methods. The filter effects show that Information Gain (IG), Gain Ratio (GR) and Relief-f, and wrapper effect show that Bagging and Naive Bayes (NB), enabled the classifiers to give the highest escalation in classification accuracy about the average while reducing the volume of unnecessary attributes. The achieved conclusions can advise the machine learning users, which classifier and feature selection methods to use to optimize the classification accuracy, and this can be important, especially at risk-sensitive applying Machine Learning whereas in the one of the aim to reduce costs of collecting, processing and storage of unnecessary data.

Download Full-text

Approximate Statistical Tests for Comparing Supervised Classification Learning Algorithms

Neural Computation ◽

10.1162/089976698300017197 ◽

1998 ◽

Vol 10 (7) ◽

pp. 1895-1923 ◽

Cited By ~ 1526

Author(s):

Thomas G. Dietterich

Keyword(s):

Cross Validation ◽

Type I Error ◽

Learning Algorithm ◽

Statistical Tests ◽

Computational Cost ◽

Learning Task ◽

T Test ◽

Type I ◽

Mcnemar’S Test ◽

Mcnemar's Test

This article reviews five approximate statistical tests for determining whether one learning algorithm outperforms another on a particular learning task. These test sare compared experimentally to determine their probability of incorrectly detecting a difference when no difference exists (type I error). Two widely used statistical tests are shown to have high probability of type I error in certain situations and should never be used: a test for the difference of two proportions and a paired-differences t test based on taking several random train-test splits. A third test, a paired-differences t test based on 10-fold cross-validation, exhibits somewhat elevated probability of type I error. A fourth test, McNemar's test, is shown to have low type I error. The fifth test is a new test, 5 × 2 cv, based on five iterations of twofold cross-validation. Experiments show that this test also has acceptable type I error. The article also measures the power (ability to detect algorithm differences when they do exist) of these tests. The cross-validated t test is the most powerful. The 5×2 cv test is shown to be slightly more powerful than McNemar's test. The choice of the best test is determined by the computational cost of running the learning algorithm. For algorithms that can be executed only once, Mc-Nemar's test is the only test with acceptable type I error. For algorithms that can be executed 10 times, the 5 × 2 cv test is recommended, because it is slightly more powerful and because it directly measures variation due to the choice of training set.

Download Full-text

A New K-Nearest Neighbors Classifier for Big Data Based on Efficient Data Pruning

Mathematics ◽

10.3390/math8020286 ◽

2020 ◽

Vol 8 (2) ◽

pp. 286 ◽

Cited By ~ 8

Author(s):

Hamid Saadatfar ◽

Samiyeh Khosravi ◽

Javad Hassannataj Joloudari ◽

Amir Mosavi ◽

Shahaboddin Shamshirband

Keyword(s):

Big Data ◽

Classification Accuracy ◽

Learning Algorithm ◽

Computational Cost ◽

Nearest Neighbors ◽

K Nearest Neighbors ◽

Parametric Classification ◽

Efficient Data ◽

Data Pruning ◽

Selection Of

The K-nearest neighbors (KNN) machine learning algorithm is a well-known non-parametric classification method. However, like other traditional data mining methods, applying it on big data comes with computational challenges. Indeed, KNN determines the class of a new sample based on the class of its nearest neighbors; however, identifying the neighbors in a large amount of data imposes a large computational cost so that it is no longer applicable by a single computing machine. One of the proposed techniques to make classification methods applicable on large datasets is pruning. LC-KNN is an improved KNN method which first clusters the data into some smaller partitions using the K-means clustering method; and then applies the KNN for each new sample on the partition which its center is the nearest one. However, because the clusters have different shapes and densities, selection of the appropriate cluster is a challenge. In this paper, an approach has been proposed to improve the pruning phase of the LC-KNN method by taking into account these factors. The proposed approach helps to choose a more appropriate cluster of data for looking for the neighbors, thus, increasing the classification accuracy. The performance of the proposed approach is evaluated on different real datasets. The experimental results show the effectiveness of the proposed approach and its higher classification accuracy and lower time cost in comparison to other recent relevant methods.

Download Full-text

Particle swarm optimisation for feature selection: A hybrid filter-wrapper approach

10.26686/wgtn.14273612.v1 ◽

2021 ◽

Author(s):

T Butler-Yeoman ◽

Bing Xue ◽

Mengjie Zhang

Keyword(s):

Feature Selection ◽

Classification Accuracy ◽

Computational Cost ◽

Particle Swarm ◽

Classification Performance ◽

Particle Swarm Optimisation ◽

Computational Time ◽

Advantages And Disadvantages ◽

Benchmark Datasets ◽

Selection Algorithms

© 2015 IEEE. Feature selection is an important pre-processing step, which can reduce the dimensionality of a dataset and increase the accuracy and efficiency of a learning/classification algorithm. However, existing feature selection algorithms mainly wrappers and filters have their own advantages and disadvantages. This paper proposes two filter-wrapper hybrid feature selection algorithms based on particle swarm optimisation (PSO), where the first algorithm named FastPSO combined filter and wrapper into the search process of PSO for feature selection with most of the evaluations as filters and a small number of evaluations as wrappers. The second algorithm named RapidPSO further reduced the number of wrapper evaluations. Theoretical analysis on FastPSO and RapidPSO is conducted to investigate their complexity. FastPSO and RapidPSO are compared with a pure wrapper algorithm named WrapperPSO and a pure filter algorithm named FilterPSO on nine benchmark datasets of varying difficulty. The experimental results show that both FastPSO and RapidPSO can successfully reduce the number of features and simultaneously increase the classification performance over using all features. The two proposed algorithms maintain the high classification performance achieved by WrapperPSO and significantly reduce the computational time, although the number of features is larger. At the same time, they increase the classification accuracy of FilterPSO and reduce the number of features, but increased the computational cost. FastPSO outperformed RapidPSO in terms of the classification accuracy and the number of features, but increased the computational time, which shows the trade-off between the efficiency and effectiveness. © 2015 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.

Download Full-text

Scalable Multilabel Learning Based on Feature and Label Dimensionality Reduction

Complexity ◽

10.1155/2018/6292143 ◽

2018 ◽

Vol 2018 ◽

pp. 1-15 ◽

Cited By ~ 1

Author(s):

Jaesung Lee ◽

Dae-Won Kim

Keyword(s):

Feature Selection ◽

Large Scale ◽

Learning Algorithm ◽

Computational Cost ◽

Real Life ◽

Feature Selection Method ◽

Concurrent Learning ◽

Discriminating Power ◽

Multilabel Learning ◽

Practical Constraints

The data-driven management of real-life systems based on a trained model, which in turn is based on the data gathered from its daily usage, has attracted a lot of attention because it realizes scalable control for large-scale and complex systems. To obtain a model within an acceptable computational cost that is restricted by practical constraints, the learning algorithm may need to identify essential data that carries important knowledge on the relation between the observed features representing the measurement value and labels encoding the multiple target concepts. This results in an increased computational burden owing to the concurrent learning of multiple labels. A straightforward approach to address this issue is feature selection; however, it may be insufficient to satisfy the practical constraints because the computational cost for feature selection can be impractical when the number of labels is large. In this study, we propose an efficient multilabel feature selection method to achieve scalable multilabel learning when the number of labels is large. The empirical experiments on several multilabel datasets show that the multilabel learning process can be boosted without deteriorating the discriminating power of the multilabel classifier.

Download Full-text

A Novel Feature Selection Measure Partnership-Gain

International Journal of Online and Biomedical Engineering (iJOE) ◽

10.3991/ijoe.v15i04.9831 ◽

2019 ◽

Vol 15 (04) ◽

pp. 4 ◽

Cited By ~ 1

Author(s):

Mostafa A. Salama ◽

Ghada Hassan

Keyword(s):

Feature Selection ◽

Classification Accuracy ◽

Classification Problem ◽

Feature Ranking ◽

Feature Subset ◽

The Novel ◽

Target Class ◽

Target Feature ◽

Mutual Correlation ◽

Feature Selection Techniques

Multivariate feature selection techniques search for the optimal features subset to reduce the dimensionality and hence the complexity of a classification task. Statistical feature selection techniques measure the mutual correlation between features well as the correlation of each feature to the tar- get feature. However, adding a feature to a feature subset could deteriorate the classification accuracy even though this feature positively correlates to the target class. Although most of existing feature ranking/selection techniques consider the interdependency between features, the nature of interaction be- tween features in relationship to the classification problem is still not well investigated. This study proposes a technique for forward feature selection that calculates the novel measure Partnership-Gain to select a subset of features whose partnership constructively correlates to the target feature classification. Comparative analysis to other well-known techniques shows that the proposed technique has either an enhanced or a comparable classification accuracy on the datasets studied. We present a visualization of the degree and direction of the proposed measure of features’ partnerships for a better understanding of the measure’s nature.

Download Full-text

Two-stage improved Grey Wolf optimization algorithm for feature selection on high-dimensional classification

Complex & Intelligent Systems ◽

10.1007/s40747-021-00452-4 ◽

2021 ◽

Author(s):

Chaonan Shen ◽

Kai Zhang

Keyword(s):

Feature Selection ◽

Multilayer Perceptron ◽

Classification Accuracy ◽

Optimization Problem ◽

Computational Cost ◽

Group Lasso ◽

High Dimensional ◽

Feature Subset ◽

Discrete Optimization Problem ◽

Two Stage

AbstractIn recent years, evolutionary algorithms have shown great advantages in the field of feature selection because of their simplicity and potential global search capability. However, most of the existing feature selection algorithms based on evolutionary computation are wrapper methods, which are computationally expensive, especially for high-dimensional biomedical data. To significantly reduce the computational cost, it is essential to study an effective evaluation method. In this paper, a two-stage improved gray wolf optimization (IGWO) algorithm for feature selection on high-dimensional data is proposed. In the first stage, a multilayer perceptron (MLP) network with group lasso regularization terms is first trained to construct an integer optimization problem using the proposed algorithm for pre-selection of features and optimization of the hidden layer structure. The dataset is compressed using the feature subset obtained in the first stage. In the second stage, a multilayer perceptron network with group lasso regularization terms is retrained using the compressed dataset, and the proposed algorithm is employed to construct the discrete optimization problem for feature selection. Meanwhile, a rapid evaluation strategy is constructed to mitigate the evaluation cost and improve the evaluation efficiency in the feature selection process. The effectiveness of the algorithm was analyzed on ten gene expression datasets. The experimental results show that the proposed algorithm not only removes almost more than 95.7% of the features in all datasets, but also has better classification accuracy on the test set. In addition, the advantages of the proposed algorithm in terms of time consumption, classification accuracy and feature subset size become more and more prominent as the dimensionality of the feature selection problem increases. This indicates that the proposed algorithm is particularly suitable for solving high-dimensional feature selection problems.

Download Full-text