Feature selection to improve generalization of genetic programming for high-dimensional symbolic regression
When learning from high-dimensional data for symbolic regression (SR), genetic programming (GP) typically could not generalize well. Feature selection, as a data preprocessing method, can potentially contribute not only to improving the efficiency of learning algorithms but also to enhancing the generalization ability. However, in GP for high-dimensional SR, feature selection before learning is seldom considered. In this paper, we propose a new feature selection method based on permutation to select features for high-dimensional SR using GP. A set of experiments has been conducted to investigate the performance of the proposed method on the generalization of GP for high-dimensional SR. The regression results confirm the superior performance of the proposed method over the other examined feature selection methods. Further analysis indicates that the models evolved by the proposed method are more likely to contain only the truly relevant features and have better interpretability. © 2017 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.