Efficient Optimization of the Parameters of LS-SVM for Regression versus Cross-Validation Error

Author(s):  
Ginés Rubio ◽  
Héctor Pomares ◽  
Ignacio Rojas ◽  
Luis Javier Herrera ◽  
Alberto Guillén
1996 ◽  
Vol 8 (7) ◽  
pp. 1391-1420 ◽  
Author(s):  
David H. Wolpert

This is the second of two papers that use off-training set (OTS) error to investigate the assumption-free relationship between learning algorithms. The first paper discusses a particular set of ways to compare learning algorithms, according to which there are no distinctions between learning algorithms. This second paper concentrates on different ways of comparing learning algorithms from those used in the first paper. In particular this second paper discusses the associated a priori distinctions that do exist between learning algorithms. In this second paper it is shown, loosely speaking, that for loss functions other than zero-one (e.g., quadratic loss), there are a priori distinctions between algorithms. However, even for such loss functions, it is shown here that any algorithm is equivalent on average to its “randomized” version, and in this still has no first principles justification in terms of average error. Nonetheless, as this paper discusses, it may be that (for example) cross-validation has better head-to-head minimax properties than “anti-cross-validation” (choose the learning algorithm with the largest cross-validation error). This may be true even for zero-one loss, a loss function for which the notion of “randomization” would not be relevant. This paper also analyzes averages over hypotheses rather than targets. Such analyses hold for all possible priors over targets. Accordingly they prove, as a particular example, that cross-validation cannot be justified as a Bayesian procedure. In fact, for a very natural restriction of the class of learning algorithms, one should use anti-cross-validation rather than cross-validation (!).


Author(s):  
Dohyun Park ◽  
Yongbin Lee ◽  
Dong-Hoon Choi

Many meta-models have been developed to approximate true responses. These meta-models are often used for optimization instead of computer simulations which require high computational cost. However, designers do not know which meta-model is the best one in advance because the accuracy of each meta-model becomes different from problem to problem. To address this difficulty, research on the ensemble of meta-models that combines stand-alone meta-models has recently been pursued with the expectation of improving the prediction accuracy. In this study, we propose a selection method of weight factors for the ensemble of meta-models based on v-nearest neighbors’ cross-validation error (CV). The four stand-alone meta-models we employed in this study are polynomial regression, Kriging, radial basis function, and support vector regression. Each method is applied to five 1-D mathematical examples and ten 2-D mathematical examples. The prediction accuracy of each stand-alone meta-model and the existing ensemble of meta-models is compared. Ensemble of meta-models shows higher accuracy than the worst stand-alone model among the four stand-alone meta-models at all test examples (30 cases). In addition, the ensemble of meta-models shows the highest accuracy for the 5 test cases. Although it has lower accuracy than the best stand-alone meta-model, it has almost same RMSE values (less than 1.1) as the best standalone model in 16 out of 30 test cases. From the results, we can conclude that proposed method is effective and robust.


Author(s):  
Reza Alizadeh ◽  
Liangyue Jia ◽  
Anand Balu Nellippallil ◽  
Guoxin Wang ◽  
Jia Hao ◽  
...  

AbstractIn engineering design, surrogate models are often used instead of costly computer simulations. Typically, a single surrogate model is selected based on the previous experience. We observe, based on an analysis of the published literature, that fitting an ensemble of surrogates (EoS) based on cross-validation errors is more accurate but requires more computational time. In this paper, we propose a method to build an EoS that is both accurate and less computationally expensive. In the proposed method, the EoS is a weighted average surrogate of response surface models, kriging, and radial basis functions based on overall cross-validation error. We demonstrate that created EoS is accurate than individual surrogates even when fewer data points are used, so computationally efficient with relatively insensitive predictions. We demonstrate the use of an EoS using hot rod rolling as an example. Finally, we include a rule-based template which can be used for other problems with similar requirements, for example, the computational time, required accuracy, and the size of the data.


Author(s):  
Felipe A. C. Viana ◽  
Raphael T. Haftka

Surrogate models are commonly used to replace expensive simulations of engineering problems. Frequently, a single surrogate is chosen based on past experience. Previous work has shown that fitting multiple surrogates and picking one based on cross-validation errors (PRESS in particular) is a good strategy, and that cross validation errors may also be used to create a weighted surrogate. In this paper, we discuss whether to use the best PRESS solution or a weighted surrogate when a single surrogate is needed. We propose the minimization of the integrated square error as a way to compute the weights of the weighted average surrogate. We find that it pays to generate a large set of different surrogates and then use PRESS as a criterion for selection. We find that the cross validation error vectors provide an excellent estimate of the RMS errors when the number of data points is high. Hence the use of cross validation errors for choosing a surrogate and for calculating the weights of weighted surrogates becomes more attractive in high dimensions. However, it appears that the potential gains from using weighted surrogates diminish substantially in high dimensions.


2020 ◽  
Author(s):  
Shan Wang ◽  
Junhua Ye ◽  
Qun Xu ◽  
Xin Xu ◽  
Yingying Yang ◽  
...  

Abstract Background: Classification of germplasm collections is of great importance for both the conservation and utilization of genetic resources. Thus, it is necessary to estimate and classify rice varieties in order to utilize these germplasms more efficiently for rice breeding. However, molecular classification of large germplasm collections can be costly and labor-intensive. Development of an informative panel of a few markers would allow for rapid and cost-effective assignment of crops to genetic sub-populations.Results: Here, the minimum number of random SNP for rice classification (MNRSRC) was studied using a panel of 51 rice varieties belonging to different sub-groups. Through the genetic structure analysis, the rice panel can be obviously divided into five subgroups. The estimation of the MNRSRC was performed using SNP random sampling method based on genetic diversity and population structure analysis. In the genetic diversity analysis, statistical analysis of the coefficient of variation (CV) was performed for MNRSRC estimation, and we found that CV variation tended to plateau when the number of SNP was around 200, which was verified by the both cross-validation error of K value and correlation analysis of genetic distance. When the number of SNPs was greater than 200, the distribution of cross-validation error value tended to be similar, and correlation coefficients, almost greater than 0.95, exhibited small range of variation. In addition, we found that MNRSRC might not be affected by the number of varieties and the type of varieties.Conclusion: The estimation of the MNRSRC was performed using SNP random sampling method based on genetic diversity and population structure analysis. The results demonstrated that at least about 200 random filtered SNP loci were required for classification in a rice panel. In addition, we also found that MNRSRC might not be affected by the number of varieties and the type of varieties. The study on MNRSRC in this study can provide a reference and theoretical basis for classification of different types of rice panels.


2019 ◽  
Vol 141 (7) ◽  
Author(s):  
Daniel Correia ◽  
Daniel N. Wilke

The construction of surrogate models, such as radial basis function (RBF) and Kriging-based surrogates, requires an invertible (square and full rank matrix) or pseudoinvertible (overdetermined) linear system to be solved. This study demonstrates that the method used to solve this linear system may result in up to five orders of magnitude difference in the accuracy of the constructed surrogate model using exactly the same information. Hence, this paper makes the canonic and important point toward reproducible science: the details of solving the linear system when constructing a surrogate model must be communicated. This point is clearly illustrated on a single function, namely the Styblinski–Tang test function by constructing over 200 RBF surrogate models from 128 Latin Hypercubed sampled points. The linear system in the construction of each surrogate model was solved using LU, QR, Cholesky, Singular-Value Decomposition, and the Moore–Penrose pseudoinverse. As we show, the decomposition method influences the utility of the surrogate model, which depends on the application, i.e., whether an accurate approximation of a surrogate is required or whether the ability to optimize the surrogate and capture the optimal design is pertinent. Evidently the selection of the optimal hyperparameters based on the cross validation error also significantly impacts the utility of the constructed surrogate. For our problem, it turns out that selecting the hyperparameters at the lowest cross validation error favors function approximation but adversely affects the ability to optimize the surrogate model. This is demonstrated by optimizing each constructed surrogate model from 16 fixed initial starting points and recording the optimal designs. For our problem, selecting the optimal hyperparameter that coincides with the lowest monotonically decreasing function value significantly improves the ability to optimize the surrogate for most solution strategies.


Sign in / Sign up

Export Citation Format

Share Document