symbolic regression Latest Research Papers

SALAMANDER: Simulating and Leveraging Autonomous Model Augmentation Using Neural Differential Equations and (Symbolic) Regression

10.2514/6.2022-1763 ◽

2022 ◽

Author(s):

Jordan Murphy ◽

Daniel J. Scheeres

Keyword(s):

Differential Equations ◽

Symbolic Regression

Simulation-Based Optimization of Residential Energy Flows Using Genetic Programming to Solve a Symbolic Regression Problem

Energy and Buildings ◽

10.1016/j.enbuild.2021.111829 ◽

2022 ◽

pp. 111829

Author(s):

Kathrin Kefer ◽

Roland Hanghofer ◽

Patrick Kefer ◽

Markus Stöger ◽

Bernd Hofer ◽

...

Keyword(s):

Genetic Programming ◽

Symbolic Regression ◽

Residential Energy ◽

Regression Problem ◽

Energy Flows ◽

Simulation Based ◽

Simulation Based Optimization

Determining residuary resistance per unit weight of displacement with Symbolic Regression and Gradient Boosted Tree algorithms

Pomorstvo ◽

10.31217/p.35.2.11 ◽

2021 ◽

Vol 35 (2) ◽

pp. 287-296

Author(s):

Sandi Baressi Šegota ◽

Ivan Lorencin ◽

Mario Šercer ◽

Zlatan Car

Keyword(s):

Symbolic Regression ◽

Unit Weight ◽

Coefficient Of Determination ◽

Percentage Error ◽

Key Factors ◽

Variable Elimination ◽

Tree Algorithms ◽

Boosted Tree ◽

Longitudinal Position ◽

Elimination Of Variables

Determining the residuary resistance per unit weight of displacement is one of the key factors in the design of vessels. In this paper, the authors utilize two novel methods – Symbolic Regression (SR) and Gradient Boosted Trees (GBT) to achieve a model which can be used to calculate the value of residuary resistance per unit weight, of displacement from the longitudinal position of the center of buoyancy, prismatic coefficient, length-displacement ratio, beam-draught ratio, length-beam ratio, and Froude number. This data is given as results of 308 experiments provided as a part of a publicly available dataset. The results are evaluated using the coefficient of determination (R2) and Mean Absolute Percentage Error (MAPE). Pre-processing, in the shape of correlation analysis combined with variable elimination and variable scaling, is applied to the dataset. The results show that while both methods achieve regression results, the result of regression of SR is relatively poor in comparison to GBT. Both methods provide slightly poorer, but comparable results to previous research focussing on the use of “black-box” methods, such as neural networks. The elimination of variables does not show a high influence on the modeling performance in the presented case, while variable scaling does achieve better results compared to the models trained with the non-scaled dataset.

Control of Magnetic Manipulator Using Reinforcement Learning Based on Incrementally Adapted Local Linear Models

Complexity ◽

10.1155/2021/6617309 ◽

2021 ◽

Vol 2021 ◽

pp. 1-12

Author(s):

Martin Brablc ◽

Jan Žegklitz ◽

Robert Grepl ◽

Robert Babuška

Keyword(s):

Reinforcement Learning ◽

Linear Regression ◽

Genetic Programming ◽

Continuous Time ◽

Symbolic Regression ◽

Local Linear Regression ◽

Model Learning ◽

Magnetic Manipulation ◽

Local Linear ◽

Multigene Genetic Programming

Reinforcement learning (RL) agents can learn to control a nonlinear system without using a model of the system. However, having a model brings benefits, mainly in terms of a reduced number of unsuccessful trials before achieving acceptable control performance. Several modelling approaches have been used in the RL domain, such as neural networks, local linear regression, or Gaussian processes. In this article, we focus on techniques that have not been used much so far: symbolic regression (SR), based on genetic programming and local modelling. Using measured data, symbolic regression yields a nonlinear, continuous-time analytic model. We benchmark two state-of-the-art methods, SNGP (single-node genetic programming) and MGGP (multigene genetic programming), against a standard incremental local regression method called RFWR (receptive field weighted regression). We have introduced modifications to the RFWR algorithm to better suit the low-dimensional continuous-time systems we are mostly dealing with. The benchmark is a nonlinear, dynamic magnetic manipulation system. The results show that using the RL framework and a suitable approximation method, it is possible to design a stable controller of such a complex system without the necessity of any haphazard learning. While all of the approximation methods were successful, MGGP achieved the best results at the cost of higher computational complexity. Index Terms–AI-based methods, local linear regression, nonlinear systems, magnetic manipulation, model learning for control, optimal control, reinforcement learning, symbolic regression.

Data Analysis and Symbolic Regression Models for Predicting CO and NOx Emissions from Gas Turbines

Computation ◽

10.3390/computation9120139 ◽

2021 ◽

Vol 9 (12) ◽

pp. 139

Author(s):

Olga Kochueva ◽

Kirill Nikolskii

Keyword(s):

Machine Learning ◽

Gas Turbines ◽

Symbolic Regression ◽

Monitoring Systems ◽

Nox Emissions ◽

Learning Methods ◽

Machine Learning Methods ◽

Model Based ◽

Emission Monitoring ◽

Electrical Generation

Predictive emission monitoring systems (PEMS) are software solutions for the validation and supplementation of costly continuous emission monitoring systems for natural gas electrical generation turbines. The basis of PEMS is that of predictive models trained on past data to estimate emission components. The gas turbine process dataset from the University of California at Irvine open data repository has initiated a challenge of sorts to investigate the quality of models of various machine learning methods to build a model for predicting CO and NOx emissions depending on ambient variables and the parameters of the technological process. The novelty and features of this paper are: (i) a contribution to the study of the features of the open dataset on CO and NOx emissions for gas turbines, which will enable one to more objectively compare different machine learning methods for further research; (ii) for the first time for the CO and NOx emissions, a model based on symbolic regression and a genetic algorithm is presented—the advantage of this being the transparency of the influence of factors and the interpretability of the model; (iii) a new classification model based on the symbolic regression model and fuzzy inference system is proposed. The coefficients of determination of the developed models are: R2=0.83 for NOx emissions, R2=0.89 for CO emissions.

Genetic Programming for Symbolic Regression on Incomplete Data

10.26686/wgtn.17150609.v1 ◽

2021 ◽

Author(s):

◽

Baligh Al-Helali

Keyword(s):

Feature Selection ◽

Transfer Learning ◽

Incomplete Data ◽

Missing Values ◽

Selection Method ◽

Symbolic Regression ◽

Instance Selection ◽

Data Sets ◽

Data Imputation ◽

Regression Methods

Symbolic regression is the process of constructing mathematical expressions that best fit given data sets, where a target variable is expressed in terms of input variables. Unlike traditional regression methods, which optimise the parameters of pre-defined models, symbolic regression learns both the model structure and its parameters simultaneously. Genetic programming (GP) is a biologically-inspired evolutionary algorithm, that automatically generates computer programs to solve a given task. The flexible representation of GP along with its ``white box" nature makes it a dominant method for symbolic regression. Moreover, GP has been successfully employed for different learning tasks such as feature selection and transfer learning. Data incompleteness is a pervasive problem in symbolic regression, and machine learning in general, especially when dealing with real-world data sets. One common approach to handling data missingness is data imputation. Data imputation is the process of estimating missing values based on existing data. Another approach to deal with incomplete data is to build learning algorithms that directly work with missing values. Although a number of methods have been proposed to tackle the data missingness issue in machine learning, most studies focus on classification tasks. Little attention has been paid to symbolic regression on incomplete data. The existing symbolic regression methods are only applicable when the given data set is complete. The overall goal of the thesis is to improve the performance of symbolic regression on incomplete data by using GP for data imputation, instance selection, feature selection, and transfer learning. This thesis develops an imputation method to handle missing values for symbolic regression. The method integrates the instance-based similarity of the k-nearest neighbour method with the feature-based predictability of GP to estimate the missing values. The results show that the proposed method outperforms existing popular imputation methods. This thesis develops an instance selection method for improving imputation for symbolic regression on incomplete data. The proposed method has the ability to simultaneously build imputation and symbolic regression models such that the performance is improved. The results show that involving instance selection with imputation advances the performance of using the imputation alone. High-dimensionality is a serious data challenge, which is even more difficult on incomplete data. To address this problem in symbolic regression tasks, this thesis develops a feature selection method that can select a good set of features directly from incomplete data. The method not only improves the regression accuracy, but also enhances the efficiency of symbolic regression on high-dimensional incomplete data. Another challenging problem is data shortage. This issue is even more challenging when the data is incomplete. To handle this situation, this thesis develops transfer learning methods to improve symbolic regression in domains with incomplete and limited data. These methods utilise two powerful abilities of GP: feature construction and feature selection. The results show the ability of these methods to achieve positive transfer learning from domains with complete data to different (but related) domains with incomplete data. In summary, the thesis develops a range of approaches to improving the effectiveness and efficiency of symbolic regression on incomplete data by developing a number of GP-based methods. The methods are evaluated using different types of data sets considering various missingness and learning scenarios.

Genetic Programming for Symbolic Regression on Incomplete Data

10.26686/wgtn.17150609 ◽

2021 ◽

Author(s):

◽

Baligh Al-Helali

Keyword(s):

Feature Selection ◽

Transfer Learning ◽

Incomplete Data ◽

Missing Values ◽

Selection Method ◽

Symbolic Regression ◽

Instance Selection ◽

Data Sets ◽

Data Imputation ◽

Regression Methods

Symbolic regression is the process of constructing mathematical expressions that best fit given data sets, where a target variable is expressed in terms of input variables. Unlike traditional regression methods, which optimise the parameters of pre-defined models, symbolic regression learns both the model structure and its parameters simultaneously. Genetic programming (GP) is a biologically-inspired evolutionary algorithm, that automatically generates computer programs to solve a given task. The flexible representation of GP along with its ``white box" nature makes it a dominant method for symbolic regression. Moreover, GP has been successfully employed for different learning tasks such as feature selection and transfer learning. Data incompleteness is a pervasive problem in symbolic regression, and machine learning in general, especially when dealing with real-world data sets. One common approach to handling data missingness is data imputation. Data imputation is the process of estimating missing values based on existing data. Another approach to deal with incomplete data is to build learning algorithms that directly work with missing values. Although a number of methods have been proposed to tackle the data missingness issue in machine learning, most studies focus on classification tasks. Little attention has been paid to symbolic regression on incomplete data. The existing symbolic regression methods are only applicable when the given data set is complete. The overall goal of the thesis is to improve the performance of symbolic regression on incomplete data by using GP for data imputation, instance selection, feature selection, and transfer learning. This thesis develops an imputation method to handle missing values for symbolic regression. The method integrates the instance-based similarity of the k-nearest neighbour method with the feature-based predictability of GP to estimate the missing values. The results show that the proposed method outperforms existing popular imputation methods. This thesis develops an instance selection method for improving imputation for symbolic regression on incomplete data. The proposed method has the ability to simultaneously build imputation and symbolic regression models such that the performance is improved. The results show that involving instance selection with imputation advances the performance of using the imputation alone. High-dimensionality is a serious data challenge, which is even more difficult on incomplete data. To address this problem in symbolic regression tasks, this thesis develops a feature selection method that can select a good set of features directly from incomplete data. The method not only improves the regression accuracy, but also enhances the efficiency of symbolic regression on high-dimensional incomplete data. Another challenging problem is data shortage. This issue is even more challenging when the data is incomplete. To handle this situation, this thesis develops transfer learning methods to improve symbolic regression in domains with incomplete and limited data. These methods utilise two powerful abilities of GP: feature construction and feature selection. The results show the ability of these methods to achieve positive transfer learning from domains with complete data to different (but related) domains with incomplete data. In summary, the thesis develops a range of approaches to improving the effectiveness and efficiency of symbolic regression on incomplete data by developing a number of GP-based methods. The methods are evaluated using different types of data sets considering various missingness and learning scenarios.

Machine learning of Kondo physics using variational autoencoders and symbolic regression

Physical Review B ◽

10.1103/physrevb.104.235111 ◽

2021 ◽

Vol 104 (23) ◽

Author(s):

Cole Miles ◽

Matthew R. Carbone ◽

Erica J. Sturm ◽

Deyu Lu ◽

Andreas Weichselbaum ◽

...

Keyword(s):

Machine Learning ◽

Symbolic Regression ◽

Kondo Physics

Seismic Fragility Analysis of the Reinforced Concrete Continuous Bridge Piers Based on Machine Learning and Symbolic Regression Fusion Algorithms

Shock and Vibration ◽

10.1155/2021/8969389 ◽

2021 ◽

Vol 2021 ◽

pp. 1-13

Author(s):

Hanbo Zhu ◽

Changqing Miao

Keyword(s):

Machine Learning ◽

Reinforced Concrete ◽

Damage Index ◽

Regression Function ◽

Element Model ◽

Seismic Intensity ◽

Symbolic Regression ◽

Fragility Analysis ◽

Seismic Fragility ◽

Intensity Measures

In the fragility analysis, researchers mostly chose and constructed seismic intensity measures (IMs) according to past experience and personal preference, resulting in large dispersion between the sample of engineering demand parameter (EDP) and the regression function with IM as the independent variable. This problem needs to be solved urgently. Firstly, the existing 46 types of ground motion intensity measures were taken as a candidate set, and the composite intensity measures (IMs) based on machine learning methods were selected and constructed. Secondly, the modified Park–Ang damage index was taken as EDP, and the symbolic regression method was used to fit the functional relationship between the composite intensity measures (CIMs) and EDP. Finally, the probabilistic seismic demand analysis (PSDA) and seismic fragility analysis were performed by the cloud-stripe method. Taking the pier of a three-span continuous reinforced concrete hollow slab bridge as an example, a nonlinear finite element model was established for vulnerability analysis. And the composite IM was compared with the linear composite IM constructed by Kiani, Lu Dagang, and Liu Tingting. The functions of them were compared. The analysis results indicated that the standard deviation of the composite IM fragility curve proposed in this paper is 60% to 70% smaller than the other composite indicators which verified the efficiency, practicality, proficiency, and sufficiency of the proposed machine learning and symbolic regression fusion algorithms in constructing composite IMs.

Establish algebraic data-driven constitutive models for elastic solids with a tensorial sparse symbolic regression method and a hybrid feature selection technique

Journal of the Mechanics and Physics of Solids ◽

10.1016/j.jmps.2021.104742 ◽

2021 ◽

pp. 104742

Author(s):

Mingchuan Wang ◽

Cai Chen ◽

Weijie Liu

Keyword(s):

Feature Selection ◽

Constitutive Models ◽

Symbolic Regression ◽

Regression Method ◽

Data Driven ◽

Elastic Solids ◽

Feature Selection Technique ◽

Selection Technique

symbolic regression
Recently Published Documents

TOTAL DOCUMENTS

H-INDEX

SALAMANDER: Simulating and Leveraging Autonomous Model Augmentation Using Neural Differential Equations and (Symbolic) Regression

Simulation-Based Optimization of Residential Energy Flows Using Genetic Programming to Solve a Symbolic Regression Problem

Determining residuary resistance per unit weight of displacement with Symbolic Regression and Gradient Boosted Tree algorithms

Control of Magnetic Manipulator Using Reinforcement Learning Based on Incrementally Adapted Local Linear Models

Data Analysis and Symbolic Regression Models for Predicting CO and NOx Emissions from Gas Turbines

Genetic Programming for Symbolic Regression on Incomplete Data

Genetic Programming for Symbolic Regression on Incomplete Data

Machine learning of Kondo physics using variational autoencoders and symbolic regression

Seismic Fragility Analysis of the Reinforced Concrete Continuous Bridge Piers Based on Machine Learning and Symbolic Regression Fusion Algorithms

Establish algebraic data-driven constitutive models for elastic solids with a tensorial sparse symbolic regression method and a hybrid feature selection technique

Export Citation Format

symbolic regressionRecently Published Documents

TOTAL DOCUMENTS

H-INDEX

SALAMANDER: Simulating and Leveraging Autonomous Model Augmentation Using Neural Differential Equations and (Symbolic) Regression

Simulation-Based Optimization of Residential Energy Flows Using Genetic Programming to Solve a Symbolic Regression Problem

Determining residuary resistance per unit weight of displacement with Symbolic Regression and Gradient Boosted Tree algorithms

Control of Magnetic Manipulator Using Reinforcement Learning Based on Incrementally Adapted Local Linear Models

Data Analysis and Symbolic Regression Models for Predicting CO and NOx Emissions from Gas Turbines

Genetic Programming for Symbolic Regression on Incomplete Data

Genetic Programming for Symbolic Regression on Incomplete Data

Machine learning of Kondo physics using variational autoencoders and symbolic regression

Seismic Fragility Analysis of the Reinforced Concrete Continuous Bridge Piers Based on Machine Learning and Symbolic Regression Fusion Algorithms

Establish algebraic data-driven constitutive models for elastic solids with a tensorial sparse symbolic regression method and a hybrid feature selection technique

symbolic regression
Recently Published Documents