scholarly journals Parameter identification for symbolic regression using nonlinear least squares

2019 ◽  
Vol 21 (3) ◽  
pp. 471-501 ◽  
Author(s):  
Michael Kommenda ◽  
Bogdan Burlacu ◽  
Gabriel Kronberger ◽  
Michael Affenzeller

AbstractIn this paper we analyze the effects of using nonlinear least squares for parameter identification of symbolic regression models and integrate it as local search mechanism in tree-based genetic programming. We employ the Levenberg–Marquardt algorithm for parameter optimization and calculate gradients via automatic differentiation. We provide examples where the parameter identification succeeds and fails and highlight its computational overhead. Using an extensive suite of symbolic regression benchmark problems we demonstrate the increased performance when incorporating nonlinear least squares within genetic programming. Our results are compared with recently published results obtained by several genetic programming variants and state of the art machine learning algorithms. Genetic programming with nonlinear least squares performs among the best on the defined benchmark suite and the local search can be easily integrated in different genetic programming algorithms as long as only differentiable functions are used within the models.

2014 ◽  
Vol 22 (2) ◽  
pp. 287-317 ◽  
Author(s):  
Raja Muhammad Atif Azad ◽  
Conor Ryan

Genetic programming (GP) coarsely models natural evolution to evolve computer programs. Unlike in nature, where individuals can often improve their fitness through lifetime experience, the fitness of GP individuals generally does not change during their lifetime, and there is usually no opportunity to pass on acquired knowledge. This paper introduces the Chameleon system to address this discrepancy and augment GP with lifetime learning by adding a simple local search that operates by tuning the internal nodes of individuals. Although not the first attempt to combine local search with GP, its simplicity means that it is easy to understand and cheap to implement. A simple cache is added which leverages the local search to reduce the tuning cost to a small fraction of the expected cost, and we provide a theoretical upper limit on the maximum tuning expense given the average tree size of the population and show that this limit grows very conservatively as the average tree size of the population increases. We show that Chameleon uses available genetic material more efficiently by exploring more actively than with standard GP, and demonstrate that not only does Chameleon outperform standard GP (on both training and test data) over a number of symbolic regression type problems, it does so by producing smaller individuals and it works harmoniously with two other well-known extensions to GP, namely, linear scaling and a diversity-promoting tournament selection method.


Author(s):  
Keiko Ono ◽  
Yoshiko Hanada ◽  
Masahito Kumano ◽  
Masahiro Kimura

Abstract In evolutionary computation approaches such as genetic programming (GP), preventing premature convergence to local minima is known to improve performance. As with other evolutionary computation methods, it can be difficult to construct an effective search bias in GP that avoids local minima. In particular, it is difficult to determine which features are the most suitable for the search bias, because GP solutions are expressed in terms of trees and have multiple features. A common approach intended to local minima is known as the Island Model. This model generates multiple populations to encourage a global search and enhance genetic diversity. To improve the Island Model in the framework of GP, we propose a novel technique using a migration strategy based on textit f requent trees and a local search, where the frequent trees refer to subtrees that appear multiple times among the individuals in the island. The proposed method evaluates each island by measuring its activation level in terms of the fitness value and how many types of frequent trees have been created. Several individuals are then migrated from an island with a high activation level to an island with a low activation level, and vice versa. The proposed method also combines strong partial solutions given by a local search. Using six kinds of benchmark problems widely adopted in the literature, we demonstrate that the incorporation of frequent tree information into a migration strategy and local search effectively improves performance. The proposed method is shown to significantly outperform both a typical Island Model GP and the aged layered population structure method.


Author(s):  
A. Artoni ◽  
M. Gabiccini ◽  
M. Guiggiani

This paper outlines a systematic methodology for finding the machine setting corrections required to obtain a predesigned ease-off surface in spiral bevel and hypoid gear teeth. The problem is given a nonlinear least squares formulation which, however, is highly prone to numerical instabilities. The Levenberg–Marquardt algorithm with a trust region strategy turned out to be quite effective and robust to obtain feasible solutions. The proposed method was tested on lengthwise crowning, profile crowning and spiral angle correction. In all cases, the goal was achieved with very high accuracy, in a few iterations and, remarkably, with different sets of machine parameters.


2020 ◽  
pp. 1-28
Author(s):  
Pak-Kan Wong ◽  
Man-Leung Wong ◽  
Kwong-Sak Leung

Genetic Programming is a method to automatically create computer programs based on the principles of evolution. The problem of deceptiveness caused by complex dependencies among components of programs is challenging. It is important because it can misguide Genetic Programming to create sub-optimal programs. Besides, a minor modification in the programs may lead to a notable change in the program behaviours and affect the final outputs. This paper presents Grammar-based Genetic Programming with Bayesian Classifiers (GBGPBC) in which the probabilistic dependencies among components of programs are captured using a set of Bayesian network classifiers. Our system was evaluated using a set of benchmark problems (the deceptive maximum problems, the royal tree problems, and the bipolar asymmetric royal tree problems). It was shown to be often more robust and more efficient in searching the best programs than other related Genetic Programming approaches in terms of the total number of fitness evaluation. We studied what factors affect the performance of GBGPBC and discovered that robust variants of GBGPBC were consistently weakly correlated with some complexity measures. Furthermore, our approach has been applied to learn a ranking program on a set of customers in direct marketing. Our suggested solutions help companies to earn significantly more when compared with other solutions produced by several well-known machine learning algorithms, such as neural networks, logistic regression, and Bayesian networks.


2020 ◽  
Author(s):  
Andreas Papritz ◽  
Peter Lehmann ◽  
Surya Gupta ◽  
Bonetti Sara ◽  
Dani Or

<p>The representation of land surface properties in hydrologic and climatic models critically depends on soil hydraulic functions (SHF). Parameters of SHF are routinely identified from soil water retention (SWR) and hydraulic conductivity (HC) data by nonlinear least squares. This is a notoriously difficult task because typically only few measurements are available per sample or plot for estimating the many SHF parameters (up to seven for the van Genuchten-Mualem model). As a consequence, the estimated parameters are often highly uncertain and could yield unrealistic predictions of related physical quantities such as the characteristic length <em>L</em><sub>c</sub> for stage‑1 evaporation (Lehmann et al., 2008). We address these limitations by capitalizing on the conditional linearity of some of the SHF parameters. Conditional linear parameters, say <em><strong>μ</strong></em>, can be substituted in the least squares objective by an explicit estimate (Bates & Watts, 1988), leading to an objective that depends only on the remaining nonlinear parameters <em><strong>ν</strong></em><strong>.</strong> This step substantially reduces the dimensionality of the SHF estimation and improves the quality of estimated parameters. Additionally, instead of minimizing the least squares objective only with box constraints for <em><strong>ν</strong></em>, we minimize it by nonlinear programming algorithms that allow to physically constrain estimates of <em><strong>ν</strong></em> by <em>L</em><sub>c</sub>. We have implemented this estimation approach in an R software package capable of processing global SWR and HC data. Using ensemble machine learning algorithms, the novel parameter estimation results will be coupled with auxiliary covariates (vegetation, climate) to create improved global maps of SHF parameters.</p><p>References:</p><p>Bates, D. M. Watts, D. G. 1988. Nonlinear Regression Analysis and Its Applications. John Wiley & Sons, New York.</p><p>Lehmann, P., Assouline, S., Or, D. 2008. Characteristic lengths affecting evaporative drying of porous media. Physical Review E, 77, 056309, DOI 10.1103/PhysRevE.77.056309.</p>


Sign in / Sign up

Export Citation Format

Share Document