HYPER-PARAMETER SELECTION FOR SPARSE LS-SVM VIA MINIMIZATION OF ITS LOCALIZED GENERALIZATION ERROR

Author(s):  
BINBIN SUN ◽  
WING W. Y. NG ◽  
DANIEL S. YEUNG ◽  
PATRICK P. K. CHAN

Sparse LS-SVM yields better generalization capability and reduces prediction time in comparison to full dense LS-SVM. However, both methods require careful selection of hyper-parameters (HPS) to achieve high generalization capability. Leave-One-Out Cross Validation (LOO-CV) and k-fold Cross Validation (k-CV) are the two most widely used hyper-parameter selection methods for LS-SVMs. However, both fail to select good hyper-parameters for sparse LS-SVM. In this paper we propose a new hyper-parameter selection method, LGEM-HPS, for LS-SVM via minimization of the Localized Generalization Error (L-GEM). The L-GEM consists of two major components: empirical mean square error and sensitivity measure. A new sensitivity measure is derived for LS-SVM to enable the LGEM-HPS select hyper-parameters yielding LS-SVM with smaller training error and minimum sensitivity to minor changes in inputs. Experiments on eleven UCI data sets show the effectiveness of the proposed method for selecting hyper-parameters for sparse LS-SVM.

Author(s):  
WING W. Y. NG ◽  
DANIEL S. YEUNG ◽  
ERIC C. C. TSANG

We had developed the localized generalization error model for supervised learning with minimization of Mean Square Error. In this work, we extend the error model to Single Layer Perceptron Neural Network (SLPNN) and Support Vector Machine (SVM) with sigmoid kernel function. For a trained SLPNN or SVM and a given training dataset, the proposed error model bounds above the error for unseen samples which are similar to the training samples. As the major component of the localized generalization error model, the stochastic sensitivity measure formula for perceptron neural network derived in this work has relaxed the assumptions of same distribution for all inputs and each sample perturbed only once in previous works. These make the sensitivity measure applicable to pattern classification problems. The stochastic sensitivity measure of SVM with Sigmoid kernel is also derived in this work as a component of the localized generalization error model. At the end of this paper, we discuss the advantages of the proposed error bound over existing error bound.


Author(s):  
WASIF AFZAL ◽  
RICHARD TORKAR ◽  
ROBERT FELDT

In the presence of a number of algorithms for classification and prediction in software engineering, there is a need to have a systematic way of assessing their performances. The performance assessment is typically done by some form of partitioning or resampling of the original data to alleviate biased estimation. For predictive and classification studies in software engineering, there is a lack of a definitive advice on the most appropriate resampling method to use. This is seen as one of the contributing factors for not being able to draw general conclusions on what modeling technique or set of predictor variables are the most appropriate. Furthermore, the use of a variety of resampling methods make it impossible to perform any formal meta-analysis of the primary study results. Therefore, it is desirable to examine the influence of various resampling methods and to quantify possible differences. Objective and method: This study empirically compares five common resampling methods (hold-out validation, repeated random sub-sampling, 10-fold cross-validation, leave-one-out cross-validation and non-parametric bootstrapping) using 8 publicly available data sets with genetic programming (GP) and multiple linear regression (MLR) as software quality classification approaches. Location of (PF, PD) pairs in the ROC (receiver operating characteristics) space and area under an ROC curve (AUC) are used as accuracy indicators. Results: The results show that in terms of the location of (PF, PD) pairs in the ROC space, bootstrapping results are in the preferred region for 3 of the 8 data sets for GP and for 4 of the 8 data sets for MLR. Based on the AUC measure, there are no significant differences between the different resampling methods using GP and MLR. Conclusion: There can be certain data set properties responsible for insignificant differences between the resampling methods based on AUC. These include imbalanced data sets, insignificant predictor variables and high-dimensional data sets. With the current selection of data sets and classification techniques, bootstrapping is a preferred method based on the location of (PF, PD) pair data in the ROC space. Hold-out validation is not a good choice for comparatively smaller data sets, where leave-one-out cross-validation (LOOCV) performs better. For comparatively larger data sets, 10-fold cross-validation performs better than LOOCV.


1999 ◽  
Vol 11 (6) ◽  
pp. 1427-1453 ◽  
Author(s):  
Michael Kearns ◽  
Dana Ron

In this article we prove sanity-check bounds for the error of the leave-oneout cross-validation estimate of the generalization error: that is, bounds showing that the worst-case error of this estimate is not much worse than that of the training error estimate. The name sanity check refers to the fact that although we often expect the leave-one-out estimate to perform considerably better than the training error estimate, we are here only seeking assurance that its performance will not be considerably worse. Perhaps surprisingly, such assurance has been given only for limited cases in the prior literature on cross-validation. Any nontrivial bound on the error of leave-one-out must rely on some notion of algorithmic stability. Previous bounds relied on the rather strong notion of hypothesis stability, whose application was primarily limited to nearest-neighbor and other local algorithms. Here we introduce the new and weaker notion of error stability and apply it to obtain sanity-check bounds for leave-one-out for other classes of learning algorithms, including training error minimization procedures and Bayesian algorithms. We also provide lower bounds demonstrating the necessity of some form of error stability for proving bounds on the error of the leave-one-out estimate, and the fact that for training error minimization algorithms, in the worst case such bounds must still depend on the Vapnik-Chervonenkis dimension of the hypothesis class.


2020 ◽  
Vol 98 (Supplement_4) ◽  
pp. 10-11
Author(s):  
Jian Cheng ◽  
Rohan Fernando ◽  
Jack C Dekkers

Abstract Efficient strategies have been developed for leave-one-out cross validation (LOOCV) of predicted phenotypes in a simple model with an overall mean and marker effects or animal genetic effects to evaluate the accuracy of genomic predictions. For such a model, the correlation between the predicted and the observed phenotype is identical to the correlation between the observed phenotype and the estimated breeding value (EBV). When the model is more complex, with multiple fixed and random effects, although the correlation between the observed and predicted phenotype can be obtained efficiently by LOOCV, it is not equal to the correlation between the observed phenotype and EBV, which is the statistic of interest. The objective here was to develop and evaluate an efficient LOOCV method for EBV or for predictions of other random effects under a general mixed linear model. The approach is based on treated all effects in the model, with large variances for fixed effects. Naïve LOOCV requires inverting the (n - 1) x (n - 1) dimensional phenotypic covariance matrix for each of the n (= no. observations) training data sets. Our method efficiently obtains these inverses from the inverse of the phenotypic covariance matrix for all n observations. Naïve LOOCV of EBV by pre-correction of fixed effects using the training data (Naïve LOOCV) and the new efficient LOOCV were compared. The new efficient LOOCV for EBV was 962 times faster than Naïve LOOCV. Prediction accuracies from the two strategies were the same (0.20). Funded by USDA-NIFA grant # 2017-67007-26144.


1993 ◽  
Vol 39 (9) ◽  
pp. 1998-2004 ◽  
Author(s):  
M L Astion ◽  
M H Wener ◽  
R G Thomas ◽  
G G Hunder ◽  
D A Bloch

Abstract Backpropagation neural networks are a computer-based pattern-recognition method that has been applied to the interpretation of clinical data. Unlike rule-based pattern recognition, backpropagation networks learn by being repetitively trained with examples of the patterns to be differentiated. We describe and analyze the phenomenon of overtraining in backpropagation networks. Overtraining refers to the reduction in generalization ability that can occur as networks are trained. The clinical application we used was the differentiation of giant cell arteritis (GCA) from other forms of vasculitis (OTH) based on results for 807 patients (593 OTH, 214 GCA) and eight clinical predictor variables. The 807 cases were randomly assigned to either a training set with 404 cases or to a cross-validation set with the remaining 403 cases. The cross-validation set was used to monitor generalization during training. Results were obtained for eight networks, each derived from a different random assignment of the 807 cases. Training error monotonically decreased during training. In contrast, the cross-validation error usually reached a minimum early in training while the training error was still decreasing. Training beyond the minimum cross-validation error was associated with an increased cross-validation error. The shape of the cross-validation error curve and the point during training corresponding to the minimum cross-validation error varied with the composition of the data sets and the training conditions. The study indicates that training error is not a reliable indicator of a network's ability to generalize. To find the point during training when a network generalizes best, one must monitor cross-validation error separately.


2019 ◽  
Vol 76 (7) ◽  
pp. 2349-2361
Author(s):  
Benjamin Misiuk ◽  
Trevor Bell ◽  
Alec Aitken ◽  
Craig J Brown ◽  
Evan N Edinger

Abstract Species distribution models are commonly used in the marine environment as management tools. The high cost of collecting marine data for modelling makes them finite, especially in remote locations. Underwater image datasets from multiple surveys were leveraged to model the presence–absence and abundance of Arctic soft-shell clam (Mya spp.) to support the management of a local small-scale fishery in Qikiqtarjuaq, Nunavut, Canada. These models were combined to predict Mya abundance, conditional on presence throughout the study area. Results suggested that water depth was the primary environmental factor limiting Mya habitat suitability, yet seabed topography and substrate characteristics influence their abundance within suitable habitat. Ten-fold cross-validation and spatial leave-one-out cross-validation (LOO CV) were used to assess the accuracy of combined predictions and to test whether this was inflated by the spatial autocorrelation of transect sample data. Results demonstrated that four different measures of predictive accuracy were substantially inflated due to spatial autocorrelation, and the spatial LOO CV results were therefore adopted as the best estimates of performance.


2014 ◽  
Vol 79 (8) ◽  
pp. 965-975 ◽  
Author(s):  
Long Jiao ◽  
Xiaofei Wang ◽  
LI. Hua ◽  
Yunxia Wang

The quantitative structure property relationship (QSPR) for gas/particle partition coefficient, Kp, of polychlorinated biphenyls (PCBs) was investigated. Molecular distance-edge vector (MDEV) index was used as the structural descriptor of PCBs. The quantitative relationship between the MDEV index and log Kp was modeled by multivariate linear regression (MLR) and artificial neural network (ANN) respectively. Leave one out cross validation and external validation were carried out to assess the prediction ability of the developed models. When the MLR method is used, the root mean square relative error (RMSRE) of prediction for leave one out cross validation and external validation is 4.72 and 8.62 respectively. When the ANN method is employed, the prediction RMSRE of leave one out cross validation and external validation is 3.87 and 7.47 respectively. It is demonstrated that the developed models are practicable for predicting the Kp of PCBs. The MDEV index is shown to be quantitatively related to the Kp of PCBs.


Sign in / Sign up

Export Citation Format

Share Document