Dimensionality Reduction Algorithms on High Dimensional Datasets

Iwan Syarif

doi:10.24003/emitter.v2i2.24

The Optimization of Feature Selection Based on Chaos Clustering Strategy and Niche Particle Swarm Optimization

Mathematical Problems in Engineering ◽

10.1155/2020/3138659 ◽

2020 ◽

Vol 2020 ◽

pp. 1-8

Author(s):

Longzhen Duan ◽

Shuqing Yang ◽

Dongbo Zhang

Keyword(s):

Feature Selection ◽

Particle Swarm Optimization ◽

Particle Swarm ◽

High Dimensional ◽

Swarm Optimization ◽

New Model ◽

Selection Algorithms ◽

Np Problem ◽

Selection Of ◽

Better Than

With the rapid increase of the data size, there are increasing demands for feature selection which has been a powerful tool to handle high-dimensional data. In this paper, we propose a novel feature selection of niche particle swarm optimization based on the chaos group, which is used for evaluating the importance of feature selection algorithms. An iterative algorithm is proposed to optimize the new model. It has been proved that solving the new model is equivalent to solving a NP problem with a flexible and adaptable norm regularization. First, the whole population is divided into two groups: NPSO group and chaos group. The two groups are iterated, respectively, and the global optimization is updated. Secondly, the cross-iteration of NPSO group and chaos group avoids the particles falling into the local optimization. Finally, three representative algorithms are selected to be compared with each other in 10 UCI datasets. The experimental results show that the feature selection performance of the algorithm is better than that of the comparison algorithm, and the classification accuracy is significantly improved.

Download Full-text

Variable-Length Particle Swarm Optimization for Feature Selection on High-Dimensional Classification

10.26686/wgtn.14273624 ◽

2021 ◽

Author(s):

B Tran ◽

Bing Xue ◽

Mengjie Zhang

Keyword(s):

Feature Selection ◽

Particle Swarm Optimization ◽

Computational Cost ◽

Particle Swarm ◽

Search Space ◽

Classification Performance ◽

Variable Length ◽

High Dimensional ◽

Swarm Optimization ◽

Local Optima

© 1997-2012 IEEE. With a global search mechanism, particle swarm optimization (PSO) has shown promise in feature selection (FS). However, most of the current PSO-based FS methods use a fix-length representation, which is inflexible and limits the performance of PSO for FS. When applying these methods to high-dimensional data, it not only consumes a significant amount of memory but also requires a high computational cost. Overcoming this limitation enables PSO to work on data with much higher dimensionality which has become more and more popular with the advance of data collection technologies. In this paper, we propose the first variable-length PSO representation for FS, enabling particles to have different and shorter lengths, which defines smaller search space and therefore, improves the performance of PSO. By rearranging features in a descending order of their relevance, we facilitate particles with shorter lengths to achieve better classification performance. Furthermore, using the proposed length changing mechanism, PSO can jump out of local optima, further narrow the search space and focus its search on smaller and more fruitful area. These strategies enable PSO to reach better solutions in a shorter time. Results on ten high-dimensional datasets with varying difficulties show that the proposed variable-length PSO can achieve much smaller feature subsets with significantly higher classification performance in much shorter time than the fixed-length PSO methods. The proposed method also outperformed the compared non-PSO FS methods in most cases. © 2019 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.

Download Full-text

A New Representation in PSO for Discretization-Based Feature Selection

10.26686/wgtn.14273582 ◽

2021 ◽

Author(s):

B Tran ◽

Bing Xue ◽

Mengjie Zhang

Keyword(s):

Feature Selection ◽

Particle Swarm Optimization ◽

Fitness Function ◽

Particle Swarm ◽

Search Space ◽

Classification Performance ◽

High Dimensional ◽

Swarm Optimization ◽

Learning Capacity ◽

Feature Interactions

In machine learning, discretization and feature selection (FS) are important techniques for preprocessing data to improve the performance of an algorithm on high-dimensional data. Since many FS methods require discrete data, a common practice is to apply discretization before FS. In addition, for the sake of efficiency, features are usually discretized individually (or univariate). This scheme works based on the assumption that each feature independently influences the task, which may not hold in cases where feature interactions exist. Therefore, univariate discretization may degrade the performance of the FS stage since information showing feature interactions may be lost during the discretization process. Initial results of our previous proposed method [evolve particle swarm optimization (EPSO)] showed that combining discretization and FS in a single stage using bare-bones particle swarm optimization (BBPSO) can lead to a better performance than applying them in two separate stages. In this paper, we propose a new method called potential particle swarm optimization (PPSO) which employs a new representation that can reduce the search space of the problem and a new fitness function to better evaluate candidate solutions to guide the search. The results on ten high-dimensional datasets show that PPSO select less than 5% of the number of features for all datasets. Compared with the two-stage approach which uses BBPSO for FS on the discretized data, PPSO achieves significantly higher accuracy on seven datasets. In addition, PPSO obtains better (or similar) classification performance than EPSO on eight datasets with a smaller number of selected features on six datasets. Furthermore, PPSO also outperforms the three compared (traditional) methods and performs similar to one method on most datasets in terms of both generalization ability and learning capacity. © 2018 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.

Download Full-text

A Master Slave Parallel Genetic Algorithm for Feature Selection in High Dimensional Datasets

International Journal of Recent Technology and Engineering - 2 ◽

10.35940/ijrte.c4184.098319 ◽

2019 ◽

Vol 8 (3) ◽

pp. 379-384

Keyword(s):

Genetic Algorithm ◽

Genetic Algorithms ◽

Feature Selection ◽

Information Gain ◽

Optimal Number ◽

Good Choice ◽

High Dimensional ◽

Parallel Genetic Algorithm ◽

Efficient Manner ◽

High Dimensional Datasets

Feature Selection in High Dimensional Datasets is a combinatorial problem as it selects the optimal subsets from N dimensional data having 2N possible subsets. Genetic Algorithms are generally a good choice for feature selection in large datasets, though for some high dimensional problems it may take varied amount of time - few seconds, few hours or even few days. Therefore, it is important to use Genetic Algorithms that can give quality results in reasonably acceptable time limit. For this purpose, it is becoming necessary to implement Genetic Algorithms in an efficient manner. In this paper, a Master Slave Parallel Genetic Algorithm is implemented as a Feature Selection procedure to diminish the time intricacies of sequential genetic algorithm. This paper describes the speed gains in parallel Master-Slave Genetic Algorithm and also discusses the theoretical analysis of optimal number of slaves required for an efficient master slave implementation. The experiments are performed on three high-dimensional gene expression data. As Genetic Algorithm is a wrapper technique and takes more time to find the importance of any feature, Information Gain technique is used first as pre-processing task to remove the irrelevant features.

Download Full-text

Particle swarm optimisation for feature selection: A hybrid filter-wrapper approach

10.26686/wgtn.14273612.v1 ◽

2021 ◽

Author(s):

T Butler-Yeoman ◽

Bing Xue ◽

Mengjie Zhang

Keyword(s):

Feature Selection ◽

Classification Accuracy ◽

Computational Cost ◽

Particle Swarm ◽

Classification Performance ◽

Particle Swarm Optimisation ◽

Computational Time ◽

Advantages And Disadvantages ◽

Benchmark Datasets ◽

Selection Algorithms

© 2015 IEEE. Feature selection is an important pre-processing step, which can reduce the dimensionality of a dataset and increase the accuracy and efficiency of a learning/classification algorithm. However, existing feature selection algorithms mainly wrappers and filters have their own advantages and disadvantages. This paper proposes two filter-wrapper hybrid feature selection algorithms based on particle swarm optimisation (PSO), where the first algorithm named FastPSO combined filter and wrapper into the search process of PSO for feature selection with most of the evaluations as filters and a small number of evaluations as wrappers. The second algorithm named RapidPSO further reduced the number of wrapper evaluations. Theoretical analysis on FastPSO and RapidPSO is conducted to investigate their complexity. FastPSO and RapidPSO are compared with a pure wrapper algorithm named WrapperPSO and a pure filter algorithm named FilterPSO on nine benchmark datasets of varying difficulty. The experimental results show that both FastPSO and RapidPSO can successfully reduce the number of features and simultaneously increase the classification performance over using all features. The two proposed algorithms maintain the high classification performance achieved by WrapperPSO and significantly reduce the computational time, although the number of features is larger. At the same time, they increase the classification accuracy of FilterPSO and reduce the number of features, but increased the computational cost. FastPSO outperformed RapidPSO in terms of the classification accuracy and the number of features, but increased the computational time, which shows the trade-off between the efficiency and effectiveness. © 2015 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.

Download Full-text

A New Representation in PSO for Discretization-Based Feature Selection

10.26686/wgtn.14273582.v1 ◽

2021 ◽

Author(s):

B Tran ◽

Bing Xue ◽

Mengjie Zhang

Keyword(s):

Feature Selection ◽

Particle Swarm Optimization ◽

Fitness Function ◽

Particle Swarm ◽

Search Space ◽

Classification Performance ◽

High Dimensional ◽

Swarm Optimization ◽

Learning Capacity ◽

Feature Interactions

In machine learning, discretization and feature selection (FS) are important techniques for preprocessing data to improve the performance of an algorithm on high-dimensional data. Since many FS methods require discrete data, a common practice is to apply discretization before FS. In addition, for the sake of efficiency, features are usually discretized individually (or univariate). This scheme works based on the assumption that each feature independently influences the task, which may not hold in cases where feature interactions exist. Therefore, univariate discretization may degrade the performance of the FS stage since information showing feature interactions may be lost during the discretization process. Initial results of our previous proposed method [evolve particle swarm optimization (EPSO)] showed that combining discretization and FS in a single stage using bare-bones particle swarm optimization (BBPSO) can lead to a better performance than applying them in two separate stages. In this paper, we propose a new method called potential particle swarm optimization (PPSO) which employs a new representation that can reduce the search space of the problem and a new fitness function to better evaluate candidate solutions to guide the search. The results on ten high-dimensional datasets show that PPSO select less than 5% of the number of features for all datasets. Compared with the two-stage approach which uses BBPSO for FS on the discretized data, PPSO achieves significantly higher accuracy on seven datasets. In addition, PPSO obtains better (or similar) classification performance than EPSO on eight datasets with a smaller number of selected features on six datasets. Furthermore, PPSO also outperforms the three compared (traditional) methods and performs similar to one method on most datasets in terms of both generalization ability and learning capacity. © 2018 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.

Download Full-text

Analytical and Experimental Study of Filter Feature Selection Algorithms for High-dimensional Datasets

Procedings of the Fourth International Workshop on Knowledge Discovery, Knowledge Management and Decision Support ◽

10.2991/.2013.42 ◽

2013 ◽

Cited By ~ 1

Author(s):

Adrian Pino ◽

Carlos Morell

Keyword(s):

Experimental Study ◽

Feature Selection ◽

High Dimensional ◽

Selection Algorithms ◽

High Dimensional Datasets

Download Full-text

Particle Swarm Optimization based Feature Selection with Evolutionary Outlay-Aware Deep Belief Network Classifier (PSO-EOA-DBNC) for High Dimensional Datasets

International Journal of Computer Sciences and Engineering ◽

10.26438/ijcse/v7i8.6169 ◽

2019 ◽

Vol 7 (8) ◽

pp. 61-69

Author(s):

M. Praveena ◽

V. Jaiganesh

Keyword(s):

Feature Selection ◽

Particle Swarm Optimization ◽

Particle Swarm ◽

Deep Belief Network ◽

High Dimensional ◽

Swarm Optimization ◽

Belief Network ◽

High Dimensional Datasets

Download Full-text

Particle swarm optimisation for feature selection: A hybrid filter-wrapper approach

10.26686/wgtn.14273612 ◽

2021 ◽

Author(s):

T Butler-Yeoman ◽

Bing Xue ◽

Mengjie Zhang

Keyword(s):

Feature Selection ◽

Classification Accuracy ◽

Computational Cost ◽

Particle Swarm ◽

Classification Performance ◽

Particle Swarm Optimisation ◽

Computational Time ◽

Advantages And Disadvantages ◽

Benchmark Datasets ◽

Selection Algorithms

© 2015 IEEE. Feature selection is an important pre-processing step, which can reduce the dimensionality of a dataset and increase the accuracy and efficiency of a learning/classification algorithm. However, existing feature selection algorithms mainly wrappers and filters have their own advantages and disadvantages. This paper proposes two filter-wrapper hybrid feature selection algorithms based on particle swarm optimisation (PSO), where the first algorithm named FastPSO combined filter and wrapper into the search process of PSO for feature selection with most of the evaluations as filters and a small number of evaluations as wrappers. The second algorithm named RapidPSO further reduced the number of wrapper evaluations. Theoretical analysis on FastPSO and RapidPSO is conducted to investigate their complexity. FastPSO and RapidPSO are compared with a pure wrapper algorithm named WrapperPSO and a pure filter algorithm named FilterPSO on nine benchmark datasets of varying difficulty. The experimental results show that both FastPSO and RapidPSO can successfully reduce the number of features and simultaneously increase the classification performance over using all features. The two proposed algorithms maintain the high classification performance achieved by WrapperPSO and significantly reduce the computational time, although the number of features is larger. At the same time, they increase the classification accuracy of FilterPSO and reduce the number of features, but increased the computational cost. FastPSO outperformed RapidPSO in terms of the classification accuracy and the number of features, but increased the computational time, which shows the trade-off between the efficiency and effectiveness. © 2015 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.

Download Full-text

Variable-Length Particle Swarm Optimization for Feature Selection on High-Dimensional Classification

10.26686/wgtn.14273624.v1 ◽

2021 ◽

Author(s):

B Tran ◽

Bing Xue ◽

Mengjie Zhang

Keyword(s):

Feature Selection ◽

Particle Swarm Optimization ◽

Computational Cost ◽

Particle Swarm ◽

Search Space ◽

Classification Performance ◽

Variable Length ◽

High Dimensional ◽

Swarm Optimization ◽

Local Optima

© 1997-2012 IEEE. With a global search mechanism, particle swarm optimization (PSO) has shown promise in feature selection (FS). However, most of the current PSO-based FS methods use a fix-length representation, which is inflexible and limits the performance of PSO for FS. When applying these methods to high-dimensional data, it not only consumes a significant amount of memory but also requires a high computational cost. Overcoming this limitation enables PSO to work on data with much higher dimensionality which has become more and more popular with the advance of data collection technologies. In this paper, we propose the first variable-length PSO representation for FS, enabling particles to have different and shorter lengths, which defines smaller search space and therefore, improves the performance of PSO. By rearranging features in a descending order of their relevance, we facilitate particles with shorter lengths to achieve better classification performance. Furthermore, using the proposed length changing mechanism, PSO can jump out of local optima, further narrow the search space and focus its search on smaller and more fruitful area. These strategies enable PSO to reach better solutions in a shorter time. Results on ten high-dimensional datasets with varying difficulties show that the proposed variable-length PSO can achieve much smaller feature subsets with significantly higher classification performance in much shorter time than the fixed-length PSO methods. The proposed method also outperformed the compared non-PSO FS methods in most cases. © 2019 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.

Download Full-text