Estimation of Distribution Algorithms for Feature Subset Selection in Large Dimensionality Domains

Feature Subset Selection (FSS) is a well-known task of Machine Learning, Data Mining, Pattern Recognition or Text Learning paradigms. Genetic Algorithms (GAs) are possibly the most commonly used algorithms for Feature Subset Selection tasks. Although the FSS literature contains many papers, few of them tackle the task of FSS in domains with more than 50 features. In this chapter we present a novel search heuristic paradigm, called Estimation of Distribution Algorithms (EDAs), as an alternative to GAs, to perform a population-based and randomized search in datasets of a large dimensionality. The EDA paradigm avoids the use of genetic crossover and mutation operators to evolve the populations. In absence of these operators, the evolution is guaranteed by the factorization of the probability distribution of the best solutions found in a generation of the search and the subsequent simulation of this distribution to obtain a new pool of solutions. In this chapter we present four different probabilistic models to perform this factorization. In a comparison with two types of GAs in natural and artificial datasets of a large dimensionality, EDAbased approaches obtain encouraging results with regard to accuracy, and a fewer number of evaluations were needed than used in genetic approaches.

Download Full-text

Feature Subset Selection by Estimation of Distribution Algorithms

Estimation of Distribution Algorithms - Genetic Algorithms and Evolutionary Computation ◽

10.1007/978-1-4615-1539-5_13 ◽

2002 ◽

pp. 269-293 ◽

Cited By ~ 5

Author(s):

I. Inza ◽

P. Larrañaga ◽

B. Sierra

Keyword(s):

Subset Selection ◽

Feature Subset Selection ◽

Feature Subset ◽

Estimation Of Distribution Algorithms ◽

Estimation Of Distribution ◽

Distribution Algorithms

Download Full-text

Feature subset selection by genetic algorithms and estimation of distribution algorithms

Artificial Intelligence in Medicine ◽

10.1016/s0933-3657(01)00085-9 ◽

2001 ◽

Vol 23 (2) ◽

pp. 187-205 ◽

Cited By ~ 33

Author(s):

I. Inza ◽

M. Merino ◽

P. Larrañaga ◽

J. Quiroga ◽

B. Sierra ◽

...

Keyword(s):

Genetic Algorithms ◽

Subset Selection ◽

Feature Subset Selection ◽

Feature Subset ◽

Estimation Of Distribution Algorithms ◽

Estimation Of Distribution ◽

Distribution Algorithms

Download Full-text

Prototype Selection and Feature Subset Selection by Estimation of Distribution Algorithms. A Case Study in the Survival of Cirrhotic Patients Treated with TIPS

Artificial Intelligence in Medicine - Lecture Notes in Computer Science ◽

10.1007/3-540-48229-6_3 ◽

2001 ◽

pp. 20-29 ◽

Cited By ~ 12

Author(s):

B. Sierra ◽

E. Lazkano ◽

I. Inza ◽

M. Merino ◽

P. Larrañaga ◽

...

Keyword(s):

Subset Selection ◽

Feature Subset Selection ◽

Feature Subset ◽

Estimation Of Distribution Algorithms ◽

Prototype Selection ◽

Estimation Of Distribution ◽

Cirrhotic Patients ◽

Distribution Algorithms

Download Full-text

Classifier Subset Selection to construct multi-classifiers by means of estimation of distribution algorithms

Neurocomputing ◽

10.1016/j.neucom.2015.01.036 ◽

2015 ◽

Vol 157 ◽

pp. 46-60 ◽

Cited By ~ 24

Author(s):

Iñigo Mendialdua ◽

Andoni Arruti ◽

Ekaitz Jauregi ◽

Elena Lazkano ◽

Basilio Sierra

Keyword(s):

Subset Selection ◽

Estimation Of Distribution Algorithms ◽

Estimation Of Distribution ◽

Distribution Algorithms

Download Full-text

GENE SELECTION FOR CANCER CLASSIFICATION USING WRAPPER APPROACHES

International Journal of Pattern Recognition and Artificial Intelligence ◽

10.1142/s0218001404003800 ◽

2004 ◽

Vol 18 (08) ◽

pp. 1373-1390 ◽

Cited By ~ 46

Author(s):

ROSA BLANCO ◽

PEDRO LARRAÑAGA ◽

IÑAKI INZA ◽

BASILIO SIERRA

Keyword(s):

Gene Expression ◽

Gene Selection ◽

Cancer Classification ◽

Feature Subset Selection ◽

Feature Subset ◽

Estimation Of Distribution ◽

Distribution Algorithms ◽

Selection Algorithms ◽

Selection Of ◽

General Method

Despite the fact that cancer classification has considerably improved, nowadays a general method that classifies known types of cancer has not yet been developed. In this work, we propose the use of supervised classification techniques, coupled with feature subset selection algorithms, to automatically perform this classification in gene expression datasets. Due to the large number of features of gene expression datasets, the search of a highly accurate combination of features is done by means of the new Estimation of Distribution Algorithms paradigm. In order to assess the accuracy level of the proposed approach, the naïve-Bayes classification algorithm is employed in a wrapper form. Promising results are achieved, in addition to a considerable reduction in the number of genes. Stating the optimal selection of genes as a search task, an automatic and robust choice in the genes finally selected is performed, in contrast to previous works that research the same types of problems.

Download Full-text

Estimation of Distribution Algorithms Applied to History Matching

SPE Journal ◽

10.2118/141161-pa ◽

2013 ◽

Vol 18 (03) ◽

pp. 508-517 ◽

Cited By ~ 6

Author(s):

Asaad Abdollahzadeh ◽

Alan Reynolds ◽

Mike Christie ◽

David Corne ◽

Glyn Williams ◽

...

Keyword(s):

History Matching ◽

Population Based ◽

Fast Convergence ◽

Bayesian Optimization ◽

Evolutionary Strategies ◽

Estimation Of Distribution Algorithms ◽

Research Activity ◽

Matching Problems ◽

Estimation Of Distribution ◽

Distribution Algorithms

Summary The topic of automatically history-matched reservoir models has seen much research activity in recent years. History matching is an example of an inverse problem, and there is significant active research on inverse problems in many other scientific and engineering areas. While many techniques from other fields, such as genetic algorithms, evolutionary strategies, differential evolution, particle swarm optimization, and the ensemble Kalman filter have been tried in the oil industry, more recent and effective ideas have yet to be tested. One of these relatively untested ideas is a class of algorithms known as estimation of distribution algorithms (EDAs). EDAs are population-based algorithms that use probability models to estimate the probability distribution of promising solutions, and then to generate new candidate solutions. EDAs have been shown to be very efficient in very complex high-dimensional problems. An example of a state-of-the-art EDA is the Bayesian optimization algorithm (BOA), which is a multivariate EDA employing Bayesian networks for modeling the relationships between good solutions. The use of a Bayesian network leads to relatively fast convergence as well as high diversity in the matched models. Given the relatively limited number of reservoir simulations used in history matching, EDA-BOA offers the promise of high-quality history matches with a fast convergence rate. In this paper, we introduce EDAs and describe BOA in detail. We show results of the EDA-BOA algorithm on two history-matching problems. First, we tune the algorithm, demonstrate convergence speed, and search diversity on the PUNQ-S3 synthetic case. Second, we apply the algorithm to a real North Sea turbidite field with multiple wells. In both examples, we show improvements in performance over traditional population-based algorithms.

Download Full-text

Addressing the advantages of using ensemble probabilistic models in Estimation of Distribution Algorithms for scheduling problems

International Journal of Production Economics ◽

10.1016/j.ijpe.2012.05.010 ◽

2013 ◽

Vol 141 (1) ◽

pp. 24-33 ◽

Cited By ~ 23

Author(s):

Shih-Hsin Chen ◽

Min-Chih Chen

Keyword(s):

Probabilistic Models ◽

Estimation Of Distribution Algorithms ◽

Scheduling Problems ◽

Estimation Of Distribution ◽

Distribution Algorithms

Download Full-text

Feature subset selection in text-learning

Machine Learning: ECML-98 - Lecture Notes in Computer Science ◽

10.1007/bfb0026677 ◽

1998 ◽

pp. 95-100 ◽

Cited By ~ 43

Author(s):

Dunja Mladenić

Keyword(s):

Subset Selection ◽

Feature Subset Selection ◽

Feature Subset ◽

Text Learning

Download Full-text

FAULT DIAGNOSIS OF MULTIPROCESSOR SYSTEMS BASED ON GENETIC AND ESTIMATION OF DISTRIBUTION ALGORITHMS: A PERFORMANCE EVALUATION

International Journal of Artificial Intelligence Tools ◽

10.1142/s0218213010000017 ◽

2010 ◽

Vol 19 (01) ◽

pp. 1-18 ◽

Cited By ~ 2

Author(s):

ELIAS P. DUARTE ◽

AURORA T. R. POZO ◽

BOGDAN T. NASSU

Keyword(s):

Large Scale ◽

Optimal Solution ◽

Population Based ◽

Experimental Results ◽

System Level ◽

Multiprocessor Systems ◽

Estimation Of Distribution Algorithms ◽

Estimation Of Distribution ◽

Distribution Algorithms ◽

Average Fitness

As faults are unavoidable in large scale multiprocessor systems, it is important to be able to determine which units of the system are working and which are faulty. System-level diagnosis is a long-standing realistic approach to detect faults in multiprocessor systems. Diagnosis is based on the results of tests executed on the system units. In this work we evaluate the performance of evolutionary algorithms applied to the diagnosis problem. Experimental results are presented for both the traditional genetic algorithm (GA) and specialized versions of the GA. We then propose and evaluate specialized versions of Estimation of Distribution Algorithms (EDA) for system-level diagnosis: the compact GA and Population-Based Incremental Learning both with and without negative examples. The evaluation was performed using four metrics: the average number of generations needed to find the solution, the average fitness after up to 500 generations, the percentage of tests that got to the optimal solution and the average time until the solution was found. An analysis of experimental results shows that more sophisticated algorithms converge faster to the optimal solution.

Download Full-text

Sensibility of Linkage Information and Effectiveness of Estimated Distributions

Evolutionary Computation ◽

10.1162/evco_a_00010 ◽

2010 ◽

Vol 18 (4) ◽

pp. 547-579 ◽

Cited By ~ 1

Author(s):

Chung-Yao Chuang ◽

Ying-ping Chen

Keyword(s):

Probabilistic Models ◽

Model Building ◽

Estimation Of Distribution Algorithms ◽

Problem Structure ◽

Linkage Information ◽

Estimation Of Distribution ◽

Separable Problems ◽

Distribution Algorithms ◽

The Given ◽

Possible Cause

The probabilistic model building performed by estimation of distribution algorithms (EDAs) enables these methods to use advanced techniques of statistics and machine learning for automatic discovery of problem structures. However, in some situations, it may not be possible to completely and accurately identify the whole problem structure by probabilistic modeling due to certain inherent properties of the given problem. In this work, we illustrate one possible cause of such situations with problems consisting of structures with unequal fitness contributions. Based on the illustrative example, we introduce a notion that the estimated probabilistic models should be inspected to reveal the effective search directions and further propose a general approach which utilizes a reserved set of solutions to examine the built model for likely inaccurate fragments. Furthermore, the proposed approach is implemented on the extended compact genetic algorithm (ECGA) and experiments are performed on several sets of additively separable problems with different scaling setups. The results indicate that the proposed method can significantly assist ECGA to handle problems comprising structures of disparate fitness contributions and therefore may potentially help EDAs in general to overcome those situations in which the entire problem structure cannot be recognized properly due to the temporal delay of emergence of some promising partial solutions.

Download Full-text