scholarly journals Numerical Simplification and its Effect on Fragment  Distributions in Genetic Programming

2021 ◽  
Author(s):  
◽  
Alan David Kinzett

<p>In tree-based genetic programming (GP) there is a tendency for the program trees to increase in size from one generation to the next. If this increase in program size is not accompanied by an improvement in fitness then this unproductive increase is known as bloat. It is standard practice to place some form of control on program size. This can be done by limiting the number of nodes or the depth of the program trees, or by adding a component to the fitness function that rewards smaller programs (parsimony pressure) or by simplifying individual programs using algebraic methods. This thesis proposes a novel program simplification method called numerical simplification that uses only the range of values the nodes take during fitness evaluation. The effect of online program simplification, both algebraic and numerical, on program size and resource usage is examined. This thesis also examines the distribution of program fragments within a genetic programming population and how this is changed by using simplification. It is shown that both simplification approaches result in reductions in average program size, memory used and computation time and that numerical simplification performs at least as well as algebraic simplification, and in some cases will outperform algebraic simplification. This reduction in program size and the resources required to process the GP run come without any significant reduction in accuracy. It is also shown that although the two online simplification methods destroy some existing program fragments, they generate new fragments during evolution, which compensates for any negative effects from the disruption of existing fragments. It is also shown that, after the first few generations, the rate new fragments are created, the rate fragments are lost from the population, and the number of distinct (different) fragments in the population remain within a very narrow range of values for the remainder of the run.</p>

2021 ◽  
Author(s):  
◽  
Alan David Kinzett

<p>In tree-based genetic programming (GP) there is a tendency for the program trees to increase in size from one generation to the next. If this increase in program size is not accompanied by an improvement in fitness then this unproductive increase is known as bloat. It is standard practice to place some form of control on program size. This can be done by limiting the number of nodes or the depth of the program trees, or by adding a component to the fitness function that rewards smaller programs (parsimony pressure) or by simplifying individual programs using algebraic methods. This thesis proposes a novel program simplification method called numerical simplification that uses only the range of values the nodes take during fitness evaluation. The effect of online program simplification, both algebraic and numerical, on program size and resource usage is examined. This thesis also examines the distribution of program fragments within a genetic programming population and how this is changed by using simplification. It is shown that both simplification approaches result in reductions in average program size, memory used and computation time and that numerical simplification performs at least as well as algebraic simplification, and in some cases will outperform algebraic simplification. This reduction in program size and the resources required to process the GP run come without any significant reduction in accuracy. It is also shown that although the two online simplification methods destroy some existing program fragments, they generate new fragments during evolution, which compensates for any negative effects from the disruption of existing fragments. It is also shown that, after the first few generations, the rate new fragments are created, the rate fragments are lost from the population, and the number of distinct (different) fragments in the population remain within a very narrow range of values for the remainder of the run.</p>


2021 ◽  
pp. 1-26
Author(s):  
Wenbin Pei ◽  
Bing Xue ◽  
Lin Shang ◽  
Mengjie Zhang

Abstract High-dimensional unbalanced classification is challenging because of the joint effects of high dimensionality and class imbalance. Genetic programming (GP) has the potential benefits for use in high-dimensional classification due to its built-in capability to select informative features. However, once data is not evenly distributed, GP tends to develop biased classifiers which achieve a high accuracy on the majority class but a low accuracy on the minority class. Unfortunately, the minority class is often at least as important as the majority class. It is of importance to investigate how GP can be effectively utilized for high-dimensional unbalanced classification. In this paper, to address the performance bias issue of GP, a new two-criterion fitness function is developed, which considers two criteria, i.e. the approximation of area under the curve (AUC) and the classification clarity (i.e. how well a program can separate two classes). The obtained values on the two criteria are combined in pairs, instead of summing them together. Furthermore, this paper designs a three-criterion tournament selection to effectively identify and select good programs to be used by genetic operators for generating better offspring during the evolutionary learning process. The experimental results show that the proposed method achieves better classification performance than other compared methods.


2015 ◽  
Vol 23 (3) ◽  
pp. 343-367 ◽  
Author(s):  
Torsten Hildebrandt ◽  
Jürgen Branke

One way to accelerate evolutionary algorithms with expensive fitness evaluations is to combine them with surrogate models. Surrogate models are efficiently computable approximations of the fitness function, derived by means of statistical or machine learning techniques from samples of fully evaluated solutions. But these models usually require a numerical representation, and therefore cannot be used with the tree representation of genetic programming (GP). In this paper, we present a new way to use surrogate models with GP. Rather than using the genotype directly as input to the surrogate model, we propose using a phenotypic characterization. This phenotypic characterization can be computed efficiently and allows us to define approximate measures of equivalence and similarity. Using a stochastic, dynamic job shop scenario as an example of simulation-based GP with an expensive fitness evaluation, we show how these ideas can be used to construct surrogate models and improve the convergence speed and solution quality of GP.


2017 ◽  
Vol 42 (4) ◽  
pp. 339-358 ◽  
Author(s):  
Krzysztof Krawiec ◽  
Paweł Liskowski

Abstract Genetic programming (GP) is a variant of evolutionary algorithm where the entities undergoing simulated evolution are computer programs. A fitness function in GP is usually based on a set of tests, each of which defines the desired output a correct program should return for an exemplary input. The outcomes of interactions between programs and tests in GP can be represented as an interaction matrix, with rows corresponding to programs in the current population and columns corresponding to tests. In previous work, we proposed SFIMX, a method that performs only a fraction of interactions and employs non-negative matrix factorization to estimate the outcomes of remaining ones, shortening GP’s runtime. In this paper, we build upon that work and propose three extensions of SFIMX, in which the subset of tests drawn to perform interactions is selected with respect to test difficulty. The conducted experiment indicates that the proposed extensions surpass the original SFIMX on a suite of discrete GP benchmarks.


Author(s):  
ZAHRA NIKDEL ◽  
HAMID BEIGY

In this paper, we introduce a new hybrid learning algorithm, called DTGP, to construct cost-sensitive classifiers. This algorithm uses a decision tree as its basic classifier and the constructed decision tree will be pruned by a genetic programming algorithm using a fitness function that is sensitive to misclassification costs. The proposed learning algorithm has been examined through six cost-sensitive problems. The experimental results show that the proposed learning algorithm outperforms in comparison to some other known learning algorithms like C4.5 or naïve Bayesian.


2013 ◽  
Vol 11 (07) ◽  
pp. 1350067 ◽  
Author(s):  
PRZEMYSŁAW SADOWSKI

In this paper, we provide a genetic programming (GP) based method for generating quantum circuits preparing maximally multipartite entangled states (MMES). The presented method is faster that known realizations thanks to the applied fitness function and several modifications to the GP schema. Moreover, we enrich the described method by the unique possibility to define an arbitrary structure of a system. We use the developed method to find new quantum circuits, which are simpler from known results. We also analyze the efficiency of generating entanglement in the spin chain system and in the system of complete connections.


Sign in / Sign up

Export Citation Format

Share Document