Efficient Sampling Methods for Machine Learning Error Models with application to Surrogates of Steady Hypersonic Flows

Surgery cancellations waste scarce operative resources and hinder patients’ access to operative services. In this study, the Wilcoxon and chi-square tests were used for predictor selection, and three machine learning models – random forest, support vector machine, and XGBoost – were used for the identification of surgeries with high risks of cancellation. The optimal performances of the identification models were as follows: sensitivity − 0.615; specificity − 0.957; positive predictive value − 0.454; negative predictive value − 0.904; accuracy − 0.647; and area under the receiver operating characteristic curve − 0.682. Of the three models, the random forest model achieved the best performance. Thus, the effective identification of surgeries with high risks of cancellation is feasible with stable performance. Models and sampling methods significantly affect the performance of identification. This study is a new application of machine learning for the identification of surgeries with high risks of cancellation and facilitation of surgery resource management.

Download Full-text

Tools for Minimizing Probabilistic Confidence Intervals

Volume 4: Turbo Expo 2005 ◽

10.1115/gt2005-68872 ◽

2005 ◽

Cited By ~ 1

Author(s):

Brian K. Beachkofski ◽

Ramana V. Grandhi

Keyword(s):

Confidence Interval ◽

Evolutionary Algorithms ◽

Confidence Intervals ◽

Sampling Methods ◽

Voronoi Tessellation ◽

Common Goal ◽

Efficient Sampling ◽

Centroidal Voronoi Tessellation ◽

Complex Engineering ◽

Computational Budget

For probabilistic designs or assessments to be acceptable, they must have the statistically robust confidence intervals provided by sampling methods. However, sample-based analyses require the number of function evaluations to be so great as to be impractical for many complex engineering applications. Efficient sampling methods allow probabilistic analysis on more applications than basic methods, although they still require a significant computational budget. This paper reviews a series of tools that aim to reduce variance in individual failure rate estimates which would reduce the confidence interval for the same number of evaluations. Several methods share a common goal, lowering the sample discrepancy within the sample space, that will create near optimal low-discrepancy sample sets. The optimization approaches include evolutionary algorithms, piecewise optimization, and centroidal Voronoi tessellation. The results of the optimization procedures show a much lower discrepancy than previous methods.

Download Full-text

Machine-learning error models for approximate solutions to parameterized systems of nonlinear equations

Computer Methods in Applied Mechanics and Engineering ◽

10.1016/j.cma.2019.01.024 ◽

2019 ◽

Vol 348 ◽

pp. 250-296 ◽

Cited By ~ 7

Author(s):

Brian A. Freno ◽

Kevin T. Carlberg

Keyword(s):

Machine Learning ◽

Nonlinear Equations ◽

Approximate Solutions ◽

Systems Of Nonlinear Equations ◽

Error Models ◽

Parameterized Systems

Download Full-text

Synthetic sampling for spatio-temporal land cover mapping with machine learning and the Google Earth Engine in Andalusia, Spain

10.5194/egusphere-egu2020-1153 ◽

2020 ◽

Author(s):

Laura Bindereif ◽

Tobias Rentschler ◽

Martin Batelheim ◽

Marta Díaz-Zorita Bonilla ◽

Philipp Gries ◽

...

Keyword(s):

Machine Learning ◽

Remote Sensing ◽

Land Cover ◽

Classification Accuracy ◽

Sampling Methods ◽

Google Earth ◽

Land Cover Mapping ◽

Machine Learning Applications ◽

Google Earth Engine ◽

Spatio Temporal

<p>Land cover information plays an essential role for resource development, environmental monitoring and protection. Amongst other natural resources, soils and soil properties are strongly affected by land cover and land cover change, which can lead to soil degradation. Remote sensing techniques are very suitable for spatio-temporal mapping of land cover mapping and change detection. With remote sensing programs vast data archives were established. Machine learning applications provide appropriate algorithms to analyse such amounts of data efficiently and with accurate results. However, machine learning methods require specific sampling techniques and are usually made for balanced datasets with an even training sample frequency. Though, most real-world datasets are imbalanced and methods to reduce the imbalance of datasets with synthetic sampling are required. Synthetic sampling methods increase the number of samples in the minority class and/or decrease the number in the majority class to achieve higher model accuracy. The Synthetic Minority Over-Sampling Technique (SMOTE) is a method to generate synthetic samples and balance the dataset used in many machine learning applications. In the middle Guadalquivir basin, Andalusia, Spain, we used random forests with Landsat images from 1984 to 2018 as covariates to map the land cover change with the Google Earth Engine. The sampling design was based on stratified random sampling according to the CORINE land cover classification of 2012. The land cover classes in our study were arable land, permanent crops (plantations), pastures/grassland, forest and shrub. Artificial surfaces and water bodies were excluded from modelling. However, the number of the 130 training samples was imbalanced. The classes pasture (7&#160;samples) and shrub (13&#160;samples) show a lower number than the other classes (48, 47 and 16&#160;samples). This led to misclassifications and negatively affected the classification accuracy. Therefore, we applied SMOTE to increase the number of samples and the classification accuracy of the model. Preliminary results are promising and show an increase of the classification accuracy, especially the accuracy of the previously underrepresented classes pasture and shrub. This corresponds to the results of studies with other objectives which also see the use of synthetic sampling methods as an improvement for the performance of classification frameworks.</p>

Download Full-text

The Effect of Training and Testing Process on Machine Learning in Biomedical Datasets

Mathematical Problems in Engineering ◽

10.1155/2020/2836236 ◽

2020 ◽

Vol 2020 ◽

pp. 1-17 ◽

Cited By ~ 4

Author(s):

Muhammed Kürşad Uçar ◽

Majid Nour ◽

Hatem Sindi ◽

Kemal Polat

Keyword(s):

Machine Learning ◽

Sampling Methods ◽

Sampling Theorem ◽

Systematic Sampling ◽

Fourth Section ◽

Sampling Theorems ◽

Testing Stage ◽

The Subject ◽

Selection Algorithms

Training and testing process for the classification of biomedical datasets in machine learning is very important. The researcher should choose carefully the methods that should be used at every step. However, there are very few studies on method choices. The studies in the literature are generally theoretical. Besides, there is no useful model for how to select samples in the training and testing process. Therefore, there is a need for resources in machine learning that discuss the training and testing process in detail and offer new recommendations. This article provides a detailed analysis of the training and testing process in machine learning. The article has the following sections. The third section describes how to prepare the datasets. Four balanced datasets were used for the application. The fourth section describes the rate and how to select samples at the training and testing stage. The fundamental sampling theorem is the subject of statistics. It shows how to select samples. In this article, it has been proposed to use sampling methods in machine learning training and testing process. The fourth section covers the theoretic expression of four different sampling theorems. Besides, the results section has the results of the performance of sampling theorems. The fifth section describes the methods by which training and pretest features can be selected. In the study, three different classifiers control the performance. The results section describes how the results should be analyzed. Additionally, this article proposes performance evaluation methods to evaluate its results. This article examines the effect of the training and testing process on performance in machine learning in detail and proposes the use of sampling theorems for the training and testing process. According to the results, datasets, feature selection algorithms, classifiers, training, and test ratio are the criteria that directly affect performance. However, the methods of selecting samples at the training and testing stages are vital for the system to work correctly. In order to design a stable system, it is recommended that samples should be selected with a stratified systematic sampling theorem.

Download Full-text

Efficient sampling of high-energy states by machine learning force fields

Physical Chemistry Chemical Physics ◽

10.1039/d0cp01399d ◽

2020 ◽

Vol 22 (25) ◽

pp. 14364-14374

Author(s):

Wojciech Plazinski ◽

Anita Plazinska ◽

Agnieszka Brzyska

Keyword(s):

Machine Learning ◽

Force Fields ◽

High Energy ◽

Efficient Sampling ◽

Energy States

A method extending the range of applicability of machine-learning force fields is proposed. It relies on biased subsampling of the high-energy states described by the predefined coordinate(s).

Download Full-text

Evaluation of re-sampling methods on performance of machine learning models to predict landslide susceptibility

Geocarto International ◽

10.1080/10106049.2020.1837257 ◽

2020 ◽

pp. 1-23

Author(s):

Moslem Borji Hassangavyar ◽

Hadi Eskandari Damaneh ◽

Quoc Bao Pham ◽

Nguyen Thi Thuy Linh ◽

John Tiefenbacher ◽

...

Keyword(s):

Machine Learning ◽

Landslide Susceptibility ◽

Sampling Methods ◽

Learning Models ◽

Machine Learning Models

Download Full-text

The Ratio of Additive and Traditional Stand Density Indices

Western Journal of Applied Forestry ◽

10.1093/wjaf/24.1.5 ◽

2009 ◽

Vol 24 (1) ◽

pp. 5-10 ◽

Cited By ~ 10

Author(s):

Mark J. Ducey

Keyword(s):

Sampling Methods ◽

Structural Complexity ◽

Stand Density ◽

Experimental Designs ◽

Diameter Distribution ◽

Efficient Sampling ◽

Density Index ◽

Density Indices ◽

Stand Density Index

Abstract The ratio between additive and original versions of Reineke's stand density index (SDI) has been used as a descriptor of stand structural complexity. That ratio also can be informative for designing efficient sampling methods and for the design of silvicultural experiments. Previous analyses of this ratio have assumed a diameter distribution without truncation, such that trees from zero to infinite dbh are possible. Truncation of the diameter distribution, e.g., by tallying only trees larger than some minimum dbh, moves the ratio much closer to one when the stand has a classic balanced uneven-aged structure. Minimum values of the ratio are found not with classic reverse-J distributions, but with sharply bimodal distributions that might be typical of a two-cohort stand. The implications for the use of novel sampling methods and for experimental designs to test whether the additive or original SDI provides better prediction in irregular stands are discussed.

Download Full-text