symbolic regression
Recently Published Documents


TOTAL DOCUMENTS

536
(FIVE YEARS 231)

H-INDEX

22
(FIVE YEARS 8)

Pomorstvo ◽  
2021 ◽  
Vol 35 (2) ◽  
pp. 287-296
Author(s):  
Sandi Baressi Šegota ◽  
Ivan Lorencin ◽  
Mario Šercer ◽  
Zlatan Car

Determining the residuary resistance per unit weight of displacement is one of the key factors in the design of vessels. In this paper, the authors utilize two novel methods – Symbolic Regression (SR) and Gradient Boosted Trees (GBT) to achieve a model which can be used to calculate the value of residuary resistance per unit weight, of displacement from the longitudinal position of the center of buoyancy, prismatic coefficient, length-displacement ratio, beam-draught ratio, length-beam ratio, and Froude number. This data is given as results of 308 experiments provided as a part of a publicly available dataset. The results are evaluated using the coefficient of determination (R2) and Mean Absolute Percentage Error (MAPE). Pre-processing, in the shape of correlation analysis combined with variable elimination and variable scaling, is applied to the dataset. The results show that while both methods achieve regression results, the result of regression of SR is relatively poor in comparison to GBT. Both methods provide slightly poorer, but comparable results to previous research focussing on the use of “black-box” methods, such as neural networks. The elimination of variables does not show a high influence on the modeling performance in the presented case, while variable scaling does achieve better results compared to the models trained with the non-scaled dataset.


Complexity ◽  
2021 ◽  
Vol 2021 ◽  
pp. 1-12
Author(s):  
Martin Brablc ◽  
Jan Žegklitz ◽  
Robert Grepl ◽  
Robert Babuška

Reinforcement learning (RL) agents can learn to control a nonlinear system without using a model of the system. However, having a model brings benefits, mainly in terms of a reduced number of unsuccessful trials before achieving acceptable control performance. Several modelling approaches have been used in the RL domain, such as neural networks, local linear regression, or Gaussian processes. In this article, we focus on techniques that have not been used much so far: symbolic regression (SR), based on genetic programming and local modelling. Using measured data, symbolic regression yields a nonlinear, continuous-time analytic model. We benchmark two state-of-the-art methods, SNGP (single-node genetic programming) and MGGP (multigene genetic programming), against a standard incremental local regression method called RFWR (receptive field weighted regression). We have introduced modifications to the RFWR algorithm to better suit the low-dimensional continuous-time systems we are mostly dealing with. The benchmark is a nonlinear, dynamic magnetic manipulation system. The results show that using the RL framework and a suitable approximation method, it is possible to design a stable controller of such a complex system without the necessity of any haphazard learning. While all of the approximation methods were successful, MGGP achieved the best results at the cost of higher computational complexity. Index Terms–AI-based methods, local linear regression, nonlinear systems, magnetic manipulation, model learning for control, optimal control, reinforcement learning, symbolic regression.


Computation ◽  
2021 ◽  
Vol 9 (12) ◽  
pp. 139
Author(s):  
Olga Kochueva ◽  
Kirill Nikolskii

Predictive emission monitoring systems (PEMS) are software solutions for the validation and supplementation of costly continuous emission monitoring systems for natural gas electrical generation turbines. The basis of PEMS is that of predictive models trained on past data to estimate emission components. The gas turbine process dataset from the University of California at Irvine open data repository has initiated a challenge of sorts to investigate the quality of models of various machine learning methods to build a model for predicting CO and NOx emissions depending on ambient variables and the parameters of the technological process. The novelty and features of this paper are: (i) a contribution to the study of the features of the open dataset on CO and NOx emissions for gas turbines, which will enable one to more objectively compare different machine learning methods for further research; (ii) for the first time for the CO and NOx emissions, a model based on symbolic regression and a genetic algorithm is presented—the advantage of this being the transparency of the influence of factors and the interpretability of the model; (iii) a new classification model based on the symbolic regression model and fuzzy inference system is proposed. The coefficients of determination of the developed models are: R2=0.83 for NOx emissions, R2=0.89 for CO emissions.


2021 ◽  
Author(s):  
◽  
Baligh Al-Helali

<p><b>Symbolic regression is the process of constructing mathematical expressions that best fit given data sets, where a target variable is expressed in terms of input variables. Unlike traditional regression methods, which optimise the parameters of pre-defined models, symbolic regression learns both the model structure and its parameters simultaneously.</b></p> <p>Genetic programming (GP) is a biologically-inspired evolutionary algorithm, that automatically generates computer programs to solve a given task. The flexible representation of GP along with its ``white box" nature makes it a dominant method for symbolic regression. Moreover, GP has been successfully employed for different learning tasks such as feature selection and transfer learning.</p> <p>Data incompleteness is a pervasive problem in symbolic regression, and machine learning in general, especially when dealing with real-world data sets. One common approach to handling data missingness is data imputation. Data imputation is the process of estimating missing values based on existing data. Another approach to deal with incomplete data is to build learning algorithms that directly work with missing values.</p> <p>Although a number of methods have been proposed to tackle the data missingness issue in machine learning, most studies focus on classification tasks. Little attention has been paid to symbolic regression on incomplete data. The existing symbolic regression methods are only applicable when the given data set is complete.</p> <p>The overall goal of the thesis is to improve the performance of symbolic regression on incomplete data by using GP for data imputation, instance selection, feature selection, and transfer learning.</p> <p>This thesis develops an imputation method to handle missing values for symbolic regression. The method integrates the instance-based similarity of the k-nearest neighbour method with the feature-based predictability of GP to estimate the missing values. The results show that the proposed method outperforms existing popular imputation methods.</p> <p>This thesis develops an instance selection method for improving imputation for symbolic regression on incomplete data. The proposed method has the ability to simultaneously build imputation and symbolic regression models such that the performance is improved. The results show that involving instance selection with imputation advances the performance of using the imputation alone.</p> <p>High-dimensionality is a serious data challenge, which is even more difficult on incomplete data. To address this problem in symbolic regression tasks, this thesis develops a feature selection method that can select a good set of features directly from incomplete data. The method not only improves the regression accuracy, but also enhances the efficiency of symbolic regression on high-dimensional incomplete data.</p> <p>Another challenging problem is data shortage. This issue is even more challenging when the data is incomplete. To handle this situation, this thesis develops transfer learning methods to improve symbolic regression in domains with incomplete and limited data. These methods utilise two powerful abilities of GP: feature construction and feature selection. The results show the ability of these methods to achieve positive transfer learning from domains with complete data to different (but related) domains with incomplete data.</p> <p>In summary, the thesis develops a range of approaches to improving the effectiveness and efficiency of symbolic regression on incomplete data by developing a number of GP-based methods. The methods are evaluated using different types of data sets considering various missingness and learning scenarios.</p>


2021 ◽  
Author(s):  
◽  
Baligh Al-Helali

<p><b>Symbolic regression is the process of constructing mathematical expressions that best fit given data sets, where a target variable is expressed in terms of input variables. Unlike traditional regression methods, which optimise the parameters of pre-defined models, symbolic regression learns both the model structure and its parameters simultaneously.</b></p> <p>Genetic programming (GP) is a biologically-inspired evolutionary algorithm, that automatically generates computer programs to solve a given task. The flexible representation of GP along with its ``white box" nature makes it a dominant method for symbolic regression. Moreover, GP has been successfully employed for different learning tasks such as feature selection and transfer learning.</p> <p>Data incompleteness is a pervasive problem in symbolic regression, and machine learning in general, especially when dealing with real-world data sets. One common approach to handling data missingness is data imputation. Data imputation is the process of estimating missing values based on existing data. Another approach to deal with incomplete data is to build learning algorithms that directly work with missing values.</p> <p>Although a number of methods have been proposed to tackle the data missingness issue in machine learning, most studies focus on classification tasks. Little attention has been paid to symbolic regression on incomplete data. The existing symbolic regression methods are only applicable when the given data set is complete.</p> <p>The overall goal of the thesis is to improve the performance of symbolic regression on incomplete data by using GP for data imputation, instance selection, feature selection, and transfer learning.</p> <p>This thesis develops an imputation method to handle missing values for symbolic regression. The method integrates the instance-based similarity of the k-nearest neighbour method with the feature-based predictability of GP to estimate the missing values. The results show that the proposed method outperforms existing popular imputation methods.</p> <p>This thesis develops an instance selection method for improving imputation for symbolic regression on incomplete data. The proposed method has the ability to simultaneously build imputation and symbolic regression models such that the performance is improved. The results show that involving instance selection with imputation advances the performance of using the imputation alone.</p> <p>High-dimensionality is a serious data challenge, which is even more difficult on incomplete data. To address this problem in symbolic regression tasks, this thesis develops a feature selection method that can select a good set of features directly from incomplete data. The method not only improves the regression accuracy, but also enhances the efficiency of symbolic regression on high-dimensional incomplete data.</p> <p>Another challenging problem is data shortage. This issue is even more challenging when the data is incomplete. To handle this situation, this thesis develops transfer learning methods to improve symbolic regression in domains with incomplete and limited data. These methods utilise two powerful abilities of GP: feature construction and feature selection. The results show the ability of these methods to achieve positive transfer learning from domains with complete data to different (but related) domains with incomplete data.</p> <p>In summary, the thesis develops a range of approaches to improving the effectiveness and efficiency of symbolic regression on incomplete data by developing a number of GP-based methods. The methods are evaluated using different types of data sets considering various missingness and learning scenarios.</p>


2021 ◽  
Vol 104 (23) ◽  
Author(s):  
Cole Miles ◽  
Matthew R. Carbone ◽  
Erica J. Sturm ◽  
Deyu Lu ◽  
Andreas Weichselbaum ◽  
...  

2021 ◽  
Vol 2021 ◽  
pp. 1-13
Author(s):  
Hanbo Zhu ◽  
Changqing Miao

In the fragility analysis, researchers mostly chose and constructed seismic intensity measures (IMs) according to past experience and personal preference, resulting in large dispersion between the sample of engineering demand parameter (EDP) and the regression function with IM as the independent variable. This problem needs to be solved urgently. Firstly, the existing 46 types of ground motion intensity measures were taken as a candidate set, and the composite intensity measures (IMs) based on machine learning methods were selected and constructed. Secondly, the modified Park–Ang damage index was taken as EDP, and the symbolic regression method was used to fit the functional relationship between the composite intensity measures (CIMs) and EDP. Finally, the probabilistic seismic demand analysis (PSDA) and seismic fragility analysis were performed by the cloud-stripe method. Taking the pier of a three-span continuous reinforced concrete hollow slab bridge as an example, a nonlinear finite element model was established for vulnerability analysis. And the composite IM was compared with the linear composite IM constructed by Kiani, Lu Dagang, and Liu Tingting. The functions of them were compared. The analysis results indicated that the standard deviation of the composite IM fragility curve proposed in this paper is 60% to 70% smaller than the other composite indicators which verified the efficiency, practicality, proficiency, and sufficiency of the proposed machine learning and symbolic regression fusion algorithms in constructing composite IMs.


Sign in / Sign up

Export Citation Format

Share Document