Efficient Sampling Methods for Machine Learning Error Models with application to Surrogates of Steady Hypersonic Flows

2022 ◽  
Author(s):  
Elizabeth H. Krath ◽  
David S. Ching ◽  
Patrick J. Blonigan
2018 ◽  
Vol 26 (1) ◽  
pp. 141-155 ◽  
Author(s):  
Li Luo ◽  
Fengyi Zhang ◽  
Yao Yao ◽  
RenRong Gong ◽  
Martina Fu ◽  
...  

Surgery cancellations waste scarce operative resources and hinder patients’ access to operative services. In this study, the Wilcoxon and chi-square tests were used for predictor selection, and three machine learning models – random forest, support vector machine, and XGBoost – were used for the identification of surgeries with high risks of cancellation. The optimal performances of the identification models were as follows: sensitivity − 0.615; specificity − 0.957; positive predictive value − 0.454; negative predictive value − 0.904; accuracy − 0.647; and area under the receiver operating characteristic curve − 0.682. Of the three models, the random forest model achieved the best performance. Thus, the effective identification of surgeries with high risks of cancellation is feasible with stable performance. Models and sampling methods significantly affect the performance of identification. This study is a new application of machine learning for the identification of surgeries with high risks of cancellation and facilitation of surgery resource management.


Author(s):  
Brian K. Beachkofski ◽  
Ramana V. Grandhi

For probabilistic designs or assessments to be acceptable, they must have the statistically robust confidence intervals provided by sampling methods. However, sample-based analyses require the number of function evaluations to be so great as to be impractical for many complex engineering applications. Efficient sampling methods allow probabilistic analysis on more applications than basic methods, although they still require a significant computational budget. This paper reviews a series of tools that aim to reduce variance in individual failure rate estimates which would reduce the confidence interval for the same number of evaluations. Several methods share a common goal, lowering the sample discrepancy within the sample space, that will create near optimal low-discrepancy sample sets. The optimization approaches include evolutionary algorithms, piecewise optimization, and centroidal Voronoi tessellation. The results of the optimization procedures show a much lower discrepancy than previous methods.


2020 ◽  
Author(s):  
Laura Bindereif ◽  
Tobias Rentschler ◽  
Martin Batelheim ◽  
Marta Díaz-Zorita Bonilla ◽  
Philipp Gries ◽  
...  

<p>Land cover information plays an essential role for resource development, environmental monitoring and protection. Amongst other natural resources, soils and soil properties are strongly affected by land cover and land cover change, which can lead to soil degradation. Remote sensing techniques are very suitable for spatio-temporal mapping of land cover mapping and change detection. With remote sensing programs vast data archives were established. Machine learning applications provide appropriate algorithms to analyse such amounts of data efficiently and with accurate results. However, machine learning methods require specific sampling techniques and are usually made for balanced datasets with an even training sample frequency. Though, most real-world datasets are imbalanced and methods to reduce the imbalance of datasets with synthetic sampling are required. Synthetic sampling methods increase the number of samples in the minority class and/or decrease the number in the majority class to achieve higher model accuracy. The Synthetic Minority Over-Sampling Technique (SMOTE) is a method to generate synthetic samples and balance the dataset used in many machine learning applications. In the middle Guadalquivir basin, Andalusia, Spain, we used random forests with Landsat images from 1984 to 2018 as covariates to map the land cover change with the Google Earth Engine. The sampling design was based on stratified random sampling according to the CORINE land cover classification of 2012. The land cover classes in our study were arable land, permanent crops (plantations), pastures/grassland, forest and shrub. Artificial surfaces and water bodies were excluded from modelling. However, the number of the 130 training samples was imbalanced. The classes pasture (7 samples) and shrub (13 samples) show a lower number than the other classes (48, 47 and 16 samples). This led to misclassifications and negatively affected the classification accuracy. Therefore, we applied SMOTE to increase the number of samples and the classification accuracy of the model. Preliminary results are promising and show an increase of the classification accuracy, especially the accuracy of the previously underrepresented classes pasture and shrub. This corresponds to the results of studies with other objectives which also see the use of synthetic sampling methods as an improvement for the performance of classification frameworks.</p>


2020 ◽  
Vol 2020 ◽  
pp. 1-17 ◽  
Author(s):  
Muhammed Kürşad Uçar ◽  
Majid Nour ◽  
Hatem Sindi ◽  
Kemal Polat

Training and testing process for the classification of biomedical datasets in machine learning is very important. The researcher should choose carefully the methods that should be used at every step. However, there are very few studies on method choices. The studies in the literature are generally theoretical. Besides, there is no useful model for how to select samples in the training and testing process. Therefore, there is a need for resources in machine learning that discuss the training and testing process in detail and offer new recommendations. This article provides a detailed analysis of the training and testing process in machine learning. The article has the following sections. The third section describes how to prepare the datasets. Four balanced datasets were used for the application. The fourth section describes the rate and how to select samples at the training and testing stage. The fundamental sampling theorem is the subject of statistics. It shows how to select samples. In this article, it has been proposed to use sampling methods in machine learning training and testing process. The fourth section covers the theoretic expression of four different sampling theorems. Besides, the results section has the results of the performance of sampling theorems. The fifth section describes the methods by which training and pretest features can be selected. In the study, three different classifiers control the performance. The results section describes how the results should be analyzed. Additionally, this article proposes performance evaluation methods to evaluate its results. This article examines the effect of the training and testing process on performance in machine learning in detail and proposes the use of sampling theorems for the training and testing process. According to the results, datasets, feature selection algorithms, classifiers, training, and test ratio are the criteria that directly affect performance. However, the methods of selecting samples at the training and testing stages are vital for the system to work correctly. In order to design a stable system, it is recommended that samples should be selected with a stratified systematic sampling theorem.


2020 ◽  
Vol 22 (25) ◽  
pp. 14364-14374
Author(s):  
Wojciech Plazinski ◽  
Anita Plazinska ◽  
Agnieszka Brzyska

A method extending the range of applicability of machine-learning force fields is proposed. It relies on biased subsampling of the high-energy states described by the predefined coordinate(s).


2020 ◽  
pp. 1-23
Author(s):  
Moslem Borji Hassangavyar ◽  
Hadi Eskandari Damaneh ◽  
Quoc Bao Pham ◽  
Nguyen Thi Thuy Linh ◽  
John Tiefenbacher ◽  
...  

2009 ◽  
Vol 24 (1) ◽  
pp. 5-10 ◽  
Author(s):  
Mark J. Ducey

Abstract The ratio between additive and original versions of Reineke's stand density index (SDI) has been used as a descriptor of stand structural complexity. That ratio also can be informative for designing efficient sampling methods and for the design of silvicultural experiments. Previous analyses of this ratio have assumed a diameter distribution without truncation, such that trees from zero to infinite dbh are possible. Truncation of the diameter distribution, e.g., by tallying only trees larger than some minimum dbh, moves the ratio much closer to one when the stand has a classic balanced uneven-aged structure. Minimum values of the ratio are found not with classic reverse-J distributions, but with sharply bimodal distributions that might be typical of a two-cohort stand. The implications for the use of novel sampling methods and for experimental designs to test whether the additive or original SDI provides better prediction in irregular stands are discussed.


Sign in / Sign up

Export Citation Format

Share Document