Efficient Optimization of the Parameters of LS-SVM for Regression versus Cross-Validation Error

This is the second of two papers that use off-training set (OTS) error to investigate the assumption-free relationship between learning algorithms. The first paper discusses a particular set of ways to compare learning algorithms, according to which there are no distinctions between learning algorithms. This second paper concentrates on different ways of comparing learning algorithms from those used in the first paper. In particular this second paper discusses the associated a priori distinctions that do exist between learning algorithms. In this second paper it is shown, loosely speaking, that for loss functions other than zero-one (e.g., quadratic loss), there are a priori distinctions between algorithms. However, even for such loss functions, it is shown here that any algorithm is equivalent on average to its “randomized” version, and in this still has no first principles justification in terms of average error. Nonetheless, as this paper discusses, it may be that (for example) cross-validation has better head-to-head minimax properties than “anti-cross-validation” (choose the learning algorithm with the largest cross-validation error). This may be true even for zero-one loss, a loss function for which the notion of “randomization” would not be relevant. This paper also analyzes averages over hypotheses rather than targets. Such analyses hold for all possible priors over targets. Accordingly they prove, as a particular example, that cross-validation cannot be justified as a Bayesian procedure. In fact, for a very natural restriction of the class of learning algorithms, one should use anti-cross-validation rather than cross-validation (!).

Download Full-text

Ensemble of Meta-Models Based on Local Error Measure Using Cross-Validation

Volume 5: 35th Design Automation Conference, Parts A and B ◽

10.1115/detc2009-87279 ◽

2009 ◽

Cited By ~ 2

Author(s):

Dohyun Park ◽

Yongbin Lee ◽

Dong-Hoon Choi

Keyword(s):

Prediction Accuracy ◽

Cross Validation ◽

Polynomial Regression ◽

Computational Cost ◽

Support Vector ◽

Local Error ◽

Test Cases ◽

Meta Model ◽

Lower Accuracy ◽

Cross Validation Error

Many meta-models have been developed to approximate true responses. These meta-models are often used for optimization instead of computer simulations which require high computational cost. However, designers do not know which meta-model is the best one in advance because the accuracy of each meta-model becomes different from problem to problem. To address this difficulty, research on the ensemble of meta-models that combines stand-alone meta-models has recently been pursued with the expectation of improving the prediction accuracy. In this study, we propose a selection method of weight factors for the ensemble of meta-models based on v-nearest neighbors’ cross-validation error (CV). The four stand-alone meta-models we employed in this study are polynomial regression, Kriging, radial basis function, and support vector regression. Each method is applied to five 1-D mathematical examples and ten 2-D mathematical examples. The prediction accuracy of each stand-alone meta-model and the existing ensemble of meta-models is compared. Ensemble of meta-models shows higher accuracy than the worst stand-alone model among the four stand-alone meta-models at all test examples (30 cases). In addition, the ensemble of meta-models shows the highest accuracy for the 5 test cases. Although it has lower accuracy than the best stand-alone meta-model, it has almost same RMSE values (less than 1.1) as the best standalone model in 16 out of 30 test cases. From the results, we can conclude that proposed method is effective and robust.

Download Full-text

Ensemble of surrogates and cross-validation for rapid and accurate predictions using small data sets

Artificial intelligence for engineering design analysis and manufacturing ◽

10.1017/s089006041900026x ◽

2019 ◽

Vol 33 (4) ◽

pp. 484-501 ◽

Cited By ~ 4

Author(s):

Reza Alizadeh ◽

Liangyue Jia ◽

Anand Balu Nellippallil ◽

Guoxin Wang ◽

Jia Hao ◽

...

Keyword(s):

Cross Validation ◽

Weighted Average ◽

Computational Time ◽

Small Data ◽

Data Sets ◽

Computationally Efficient ◽

Cross Validation Error ◽

Rod Rolling ◽

Ensemble Of Surrogates ◽

Time Required

AbstractIn engineering design, surrogate models are often used instead of costly computer simulations. Typically, a single surrogate model is selected based on the previous experience. We observe, based on an analysis of the published literature, that fitting an ensemble of surrogates (EoS) based on cross-validation errors is more accurate but requires more computational time. In this paper, we propose a method to build an EoS that is both accurate and less computationally expensive. In the proposed method, the EoS is a weighted average surrogate of response surface models, kriging, and radial basis functions based on overall cross-validation error. We demonstrate that created EoS is accurate than individual surrogates even when fewer data points are used, so computationally efficient with relatively insensitive predictions. We demonstrate the use of an EoS using hot rod rolling as an example. Finally, we include a rule-based template which can be used for other problems with similar requirements, for example, the computational time, required accuracy, and the size of the data.

Download Full-text

A theory of cross-validation error

Journal of Experimental & Theoretical Artificial Intelligence ◽

10.1080/09528139408953794 ◽

1994 ◽

Vol 6 (4) ◽

pp. 361-391 ◽

Cited By ~ 12

Author(s):

PETER TURNEY

Keyword(s):

Cross Validation ◽

Cross Validation Error

Download Full-text

Using Multiple Surrogates for Minimization of the RMS Error in Meta-Modeling

Volume 1: 34th Design Automation Conference, Parts A and B ◽

10.1115/detc2008-49240 ◽

2008 ◽

Cited By ~ 4

Author(s):

Felipe A. C. Viana ◽

Raphael T. Haftka

Keyword(s):

Cross Validation ◽

Weighted Average ◽

Large Set ◽

High Dimensions ◽

Engineering Problems ◽

Cross Validation Error ◽

Data Points ◽

Meta Modeling ◽

Rms Error ◽

Rms Errors

Surrogate models are commonly used to replace expensive simulations of engineering problems. Frequently, a single surrogate is chosen based on past experience. Previous work has shown that fitting multiple surrogates and picking one based on cross-validation errors (PRESS in particular) is a good strategy, and that cross validation errors may also be used to create a weighted surrogate. In this paper, we discuss whether to use the best PRESS solution or a weighted surrogate when a single surrogate is needed. We propose the minimization of the integrated square error as a way to compute the weights of the weighted average surrogate. We find that it pays to generate a large set of different surrogates and then use PRESS as a criterion for selection. We find that the cross validation error vectors provide an excellent estimate of the RMS errors when the number of data points is high. Hence the use of cross validation errors for choosing a surrogate and for calculating the weights of weighted surrogates becomes more attractive in high dimensions. However, it appears that the potential gains from using weighted surrogates diminish substantially in high dimensions.

Download Full-text

Determination of minimum number of random SNP for accurate population classification in rice (Oryza sativa L.)

10.21203/rs.3.rs-35906/v1 ◽

2020 ◽

Author(s):

Shan Wang ◽

Junhua Ye ◽

Qun Xu ◽

Xin Xu ◽

Yingying Yang ◽

...

Keyword(s):

Genetic Diversity ◽

Structure Analysis ◽

Sampling Method ◽

Cross Validation ◽

Rice Varieties ◽

Germplasm Collections ◽

Population Structure Analysis ◽

Cross Validation Error ◽

Minimum Number

Abstract Background: Classification of germplasm collections is of great importance for both the conservation and utilization of genetic resources. Thus, it is necessary to estimate and classify rice varieties in order to utilize these germplasms more efficiently for rice breeding. However, molecular classification of large germplasm collections can be costly and labor-intensive. Development of an informative panel of a few markers would allow for rapid and cost-effective assignment of crops to genetic sub-populations.Results: Here, the minimum number of random SNP for rice classification (MNRSRC) was studied using a panel of 51 rice varieties belonging to different sub-groups. Through the genetic structure analysis, the rice panel can be obviously divided into five subgroups. The estimation of the MNRSRC was performed using SNP random sampling method based on genetic diversity and population structure analysis. In the genetic diversity analysis, statistical analysis of the coefficient of variation (CV) was performed for MNRSRC estimation, and we found that CV variation tended to plateau when the number of SNP was around 200, which was verified by the both cross-validation error of K value and correlation analysis of genetic distance. When the number of SNPs was greater than 200, the distribution of cross-validation error value tended to be similar, and correlation coefficients, almost greater than 0.95, exhibited small range of variation. In addition, we found that MNRSRC might not be affected by the number of varieties and the type of varieties.Conclusion: The estimation of the MNRSRC was performed using SNP random sampling method based on genetic diversity and population structure analysis. The results demonstrated that at least about 200 random filtered SNP loci were required for classification in a rice panel. In addition, we also found that MNRSRC might not be affected by the number of varieties and the type of varieties. The study on MNRSRC in this study can provide a reference and theoretical basis for classification of different types of rice panels.

Download Full-text

How We Solve the Weights in Our Surrogate Models Matters

Journal of Mechanical Design ◽

10.1115/1.4042622 ◽

2019 ◽

Vol 141 (7) ◽

Cited By ~ 1

Author(s):

Daniel Correia ◽

Daniel N. Wilke

Keyword(s):

Linear System ◽

Surrogate Model ◽

Cross Validation ◽

Full Rank ◽

Surrogate Models ◽

Rank Matrix ◽

Cross Validation Error ◽

Monotonically Decreasing Function ◽

Value Decomposition ◽

Initial Starting

The construction of surrogate models, such as radial basis function (RBF) and Kriging-based surrogates, requires an invertible (square and full rank matrix) or pseudoinvertible (overdetermined) linear system to be solved. This study demonstrates that the method used to solve this linear system may result in up to five orders of magnitude difference in the accuracy of the constructed surrogate model using exactly the same information. Hence, this paper makes the canonic and important point toward reproducible science: the details of solving the linear system when constructing a surrogate model must be communicated. This point is clearly illustrated on a single function, namely the Styblinski–Tang test function by constructing over 200 RBF surrogate models from 128 Latin Hypercubed sampled points. The linear system in the construction of each surrogate model was solved using LU, QR, Cholesky, Singular-Value Decomposition, and the Moore–Penrose pseudoinverse. As we show, the decomposition method influences the utility of the surrogate model, which depends on the application, i.e., whether an accurate approximation of a surrogate is required or whether the ability to optimize the surrogate and capture the optimal design is pertinent. Evidently the selection of the optimal hyperparameters based on the cross validation error also significantly impacts the utility of the constructed surrogate. For our problem, it turns out that selecting the hyperparameters at the lowest cross validation error favors function approximation but adversely affects the ability to optimize the surrogate model. This is demonstrated by optimizing each constructed surrogate model from 16 fixed initial starting points and recording the optimal designs. For our problem, selecting the optimal hyperparameter that coincides with the lowest monotonically decreasing function value significantly improves the ability to optimize the surrogate for most solution strategies.

Download Full-text