scholarly journals Peer Review #3 of "Discrete natural neighbour interpolation with uncertainty using cross-validation error-distance fields (v0.1)"

Author(s):  
A Crawford
2020 ◽  
Vol 6 ◽  
pp. e282
Author(s):  
Thomas R. Etherington

Interpolation techniques provide a method to convert point data of a geographic phenomenon into a continuous field estimate of that phenomenon, and have become a fundamental geocomputational technique of spatial and geographical analysts. Natural neighbour interpolation is one method of interpolation that has several useful properties: it is an exact interpolator, it creates a smooth surface free of any discontinuities, it is a local method, is spatially adaptive, requires no statistical assumptions, can be applied to small datasets, and is parameter free. However, as with any interpolation method, there will be uncertainty in how well the interpolated field values reflect actual phenomenon values. Using a method based on natural neighbour distance based rates of error calculated for data points via cross-validation, a cross-validation error-distance field can be produced to associate uncertainty with the interpolation. Virtual geography experiments demonstrate that given an appropriate number of data points and spatial-autocorrelation of the phenomenon being interpolated, the natural neighbour interpolation and cross-validation error-distance fields provide reliable estimates of value and error within the convex hull of the data points. While this method does not replace the need for analysts to use sound judgement in their interpolations, for those researchers for whom natural neighbour interpolation is the best interpolation option the method presented provides a way to assess the uncertainty associated with natural neighbour interpolations.


1996 ◽  
Vol 8 (7) ◽  
pp. 1391-1420 ◽  
Author(s):  
David H. Wolpert

This is the second of two papers that use off-training set (OTS) error to investigate the assumption-free relationship between learning algorithms. The first paper discusses a particular set of ways to compare learning algorithms, according to which there are no distinctions between learning algorithms. This second paper concentrates on different ways of comparing learning algorithms from those used in the first paper. In particular this second paper discusses the associated a priori distinctions that do exist between learning algorithms. In this second paper it is shown, loosely speaking, that for loss functions other than zero-one (e.g., quadratic loss), there are a priori distinctions between algorithms. However, even for such loss functions, it is shown here that any algorithm is equivalent on average to its “randomized” version, and in this still has no first principles justification in terms of average error. Nonetheless, as this paper discusses, it may be that (for example) cross-validation has better head-to-head minimax properties than “anti-cross-validation” (choose the learning algorithm with the largest cross-validation error). This may be true even for zero-one loss, a loss function for which the notion of “randomization” would not be relevant. This paper also analyzes averages over hypotheses rather than targets. Such analyses hold for all possible priors over targets. Accordingly they prove, as a particular example, that cross-validation cannot be justified as a Bayesian procedure. In fact, for a very natural restriction of the class of learning algorithms, one should use anti-cross-validation rather than cross-validation (!).


Author(s):  
Ginés Rubio ◽  
Héctor Pomares ◽  
Ignacio Rojas ◽  
Luis Javier Herrera ◽  
Alberto Guillén

Author(s):  
Dohyun Park ◽  
Yongbin Lee ◽  
Dong-Hoon Choi

Many meta-models have been developed to approximate true responses. These meta-models are often used for optimization instead of computer simulations which require high computational cost. However, designers do not know which meta-model is the best one in advance because the accuracy of each meta-model becomes different from problem to problem. To address this difficulty, research on the ensemble of meta-models that combines stand-alone meta-models has recently been pursued with the expectation of improving the prediction accuracy. In this study, we propose a selection method of weight factors for the ensemble of meta-models based on v-nearest neighbors’ cross-validation error (CV). The four stand-alone meta-models we employed in this study are polynomial regression, Kriging, radial basis function, and support vector regression. Each method is applied to five 1-D mathematical examples and ten 2-D mathematical examples. The prediction accuracy of each stand-alone meta-model and the existing ensemble of meta-models is compared. Ensemble of meta-models shows higher accuracy than the worst stand-alone model among the four stand-alone meta-models at all test examples (30 cases). In addition, the ensemble of meta-models shows the highest accuracy for the 5 test cases. Although it has lower accuracy than the best stand-alone meta-model, it has almost same RMSE values (less than 1.1) as the best standalone model in 16 out of 30 test cases. From the results, we can conclude that proposed method is effective and robust.


Author(s):  
Reza Alizadeh ◽  
Liangyue Jia ◽  
Anand Balu Nellippallil ◽  
Guoxin Wang ◽  
Jia Hao ◽  
...  

AbstractIn engineering design, surrogate models are often used instead of costly computer simulations. Typically, a single surrogate model is selected based on the previous experience. We observe, based on an analysis of the published literature, that fitting an ensemble of surrogates (EoS) based on cross-validation errors is more accurate but requires more computational time. In this paper, we propose a method to build an EoS that is both accurate and less computationally expensive. In the proposed method, the EoS is a weighted average surrogate of response surface models, kriging, and radial basis functions based on overall cross-validation error. We demonstrate that created EoS is accurate than individual surrogates even when fewer data points are used, so computationally efficient with relatively insensitive predictions. We demonstrate the use of an EoS using hot rod rolling as an example. Finally, we include a rule-based template which can be used for other problems with similar requirements, for example, the computational time, required accuracy, and the size of the data.


Author(s):  
Felipe A. C. Viana ◽  
Raphael T. Haftka

Surrogate models are commonly used to replace expensive simulations of engineering problems. Frequently, a single surrogate is chosen based on past experience. Previous work has shown that fitting multiple surrogates and picking one based on cross-validation errors (PRESS in particular) is a good strategy, and that cross validation errors may also be used to create a weighted surrogate. In this paper, we discuss whether to use the best PRESS solution or a weighted surrogate when a single surrogate is needed. We propose the minimization of the integrated square error as a way to compute the weights of the weighted average surrogate. We find that it pays to generate a large set of different surrogates and then use PRESS as a criterion for selection. We find that the cross validation error vectors provide an excellent estimate of the RMS errors when the number of data points is high. Hence the use of cross validation errors for choosing a surrogate and for calculating the weights of weighted surrogates becomes more attractive in high dimensions. However, it appears that the potential gains from using weighted surrogates diminish substantially in high dimensions.


2020 ◽  
Author(s):  
Shan Wang ◽  
Junhua Ye ◽  
Qun Xu ◽  
Xin Xu ◽  
Yingying Yang ◽  
...  

Abstract Background: Classification of germplasm collections is of great importance for both the conservation and utilization of genetic resources. Thus, it is necessary to estimate and classify rice varieties in order to utilize these germplasms more efficiently for rice breeding. However, molecular classification of large germplasm collections can be costly and labor-intensive. Development of an informative panel of a few markers would allow for rapid and cost-effective assignment of crops to genetic sub-populations.Results: Here, the minimum number of random SNP for rice classification (MNRSRC) was studied using a panel of 51 rice varieties belonging to different sub-groups. Through the genetic structure analysis, the rice panel can be obviously divided into five subgroups. The estimation of the MNRSRC was performed using SNP random sampling method based on genetic diversity and population structure analysis. In the genetic diversity analysis, statistical analysis of the coefficient of variation (CV) was performed for MNRSRC estimation, and we found that CV variation tended to plateau when the number of SNP was around 200, which was verified by the both cross-validation error of K value and correlation analysis of genetic distance. When the number of SNPs was greater than 200, the distribution of cross-validation error value tended to be similar, and correlation coefficients, almost greater than 0.95, exhibited small range of variation. In addition, we found that MNRSRC might not be affected by the number of varieties and the type of varieties.Conclusion: The estimation of the MNRSRC was performed using SNP random sampling method based on genetic diversity and population structure analysis. The results demonstrated that at least about 200 random filtered SNP loci were required for classification in a rice panel. In addition, we also found that MNRSRC might not be affected by the number of varieties and the type of varieties. The study on MNRSRC in this study can provide a reference and theoretical basis for classification of different types of rice panels.


Sign in / Sign up

Export Citation Format

Share Document