computational budget
Recently Published Documents


TOTAL DOCUMENTS

54
(FIVE YEARS 29)

H-INDEX

8
(FIVE YEARS 2)

2021 ◽  
Author(s):  
Qiyuan Zhao ◽  
Hsuan-Hao Hsu ◽  
Brett Savoie

Transition state searches are the basis for characterizing reaction mechanisms and activation energies, and are thus central to myriad chemical applications. Nevertheless, common search algorithms are sensitive to molecular conformation and the conformational space of even medium-sized reacting systems are too complex to explore with brute force. Here we show that it is possible to train a classifier to learn the features of conformers that conduce successful transition state searches, such that optimal conformers can be down-selected before incurring the cost of a high-level transition state search. To this end, we have benchmarked the use of a modern conformational generation algorithm with our reaction prediction methodology, Yet Another Reaction Program (YARP), for reaction prediction tasks. We demonstrate that neglecting conformer contributions leads to qualitatively incorrect activation energy estimations for a broad range of reactions, whereas a simple random forest classifier can be used to reliably down-select low-barrier conformers. We also compare the relative advantage of performing conformational sampling on reactant, product, and putative transition state geometries. The robust performance of this relatively simple machine learning classifier mitigates cost as a factor when implementing conformational sampling into contemporary reaction prediction workflows.


Author(s):  
Chris Sherlock ◽  
Anthony Lee

AbstractA delayed-acceptance version of a Metropolis–Hastings algorithm can be useful for Bayesian inference when it is computationally expensive to calculate the true posterior, but a computationally cheap approximation is available; the delayed-acceptance kernel targets the same posterior as its associated “parent” Metropolis-Hastings kernel. Although the asymptotic variance of the ergodic average of any functional of the delayed-acceptance chain cannot be less than that obtained using its parent, the average computational time per iteration can be much smaller and so for a given computational budget the delayed-acceptance kernel can be more efficient. When the asymptotic variance of the ergodic averages of all $$L^2$$ L 2 functionals of the chain are finite, the kernel is said to be variance bounding. It has recently been noted that a delayed-acceptance kernel need not be variance bounding even when its parent is. We provide sufficient conditions for inheritance: for non-local algorithms, such as the independence sampler, the discrepancy between the log density of the approximation and that of the truth should be bounded; for local algorithms, two alternative sets of conditions are provided. As a by-product of our initial, general result we also supply sufficient conditions on any pair of proposals such that, for any shared target distribution, if a Metropolis-Hastings kernel using one of the proposals is variance bounding then so is the Metropolis-Hastings kernel using the other proposal.


2021 ◽  
pp. 1-21
Author(s):  
Thomas Helmuth ◽  
Lee Spector

Abstract In genetic programming, an evolutionary method for producing computer programs that solve specified computational problems, parent selection is ordinarily based on aggregate measures of performance across an entire training set. Lexicase selection, by contrast, selects on the basis of performance on random sequences of training cases; this has been shown to enhance problem-solving power in many circumstances. Lexicase selection can also be seen as better reflecting biological evolution, by modeling sequences of challenges that organisms face over their lifetimes. Recent work has demonstrated that the advantages of lexicase selection can be amplified by down-sampling, meaning that only a random subsample of the training cases is used each generation. This can be seen as modeling the fact that individual organisms encounter only subsets of the possible environments and that environments change over time. Here we provide the most extensive benchmarking of down-sampled lexicase selection to date, showing that its benefits hold up to increased scrutiny. The reasons that down-sampling helps, however, are not yet fully understood. Hypotheses include that down-sampling allows for more generations to be processed with the same budget of program evaluations; that the variation of training data across generations acts as a changing environment, encouraging adaptation; or that it reduces overfitting, leading to more general solutions. We systematically evaluate these hypotheses, finding evidence against all three, and instead draw the conclusion that down-sampled lexicase selection's main benefit stems from the fact that it allows the evolutionary process to examine more individuals within the same computational budget, even though each individual is examined less completely.


Author(s):  
Christian Perron ◽  
Dushhyanth Rajaram ◽  
Dimitri N. Mavris

This work presents the development of a multi-fidelity, parametric and non-intrusive reduced-order modelling method to tackle the problem of achieving an acceptable predictive accuracy under a limited computational budget, i.e. with expensive simulations and sparse training data. Traditional multi-fidelity surrogate models that predict scalar quantities address this issue by leveraging auxiliary data generated by a computationally cheaper lower fidelity code. However, for the prediction of field quantities, simulations of different fidelities may produce responses with inconsistent representations, rendering the direct application of common multi-fidelity techniques challenging. The proposed approach uses manifold alignment to fuse inconsistent fields from high- and low-fidelity simulations by individually projecting their solution onto a common latent space. Hence, simulations using incompatible grids or geometries can be combined into a single multi-fidelity reduced-order model without additional manipulation of the data. This method is applied to a variety of multi-fidelity scenarios using a transonic airfoil problem. In most cases, the new multi-fidelity reduced-order model achieves comparable predictive accuracy at a lower computational cost. Furthermore, it is demonstrated that the proposed method can combine disparate fields without any adverse effect on predictive performance.


2021 ◽  
Author(s):  
Anh Tran

Abstract Bayesian optimization (BO) is a flexible and powerful framework that is suitable for computationally expensive simulation-based applications and guarantees statistical convergence to the global optimum. While remaining as one of the most popular optimization methods, its capability is hindered by the size of data, the dimensionality of the considered problem, and the nature of sequential optimization. These scalability issues are intertwined with each other and must be tackled simultaneously. In this work, we propose the Scalable3-BO framework, which employs sparse GP as the underlying surrogate model to scope with Big Data and is equipped with a random embedding to efficiently optimize high-dimensional problems with low effective dimensionality. The Scalable3-BO framework is further leveraged with asynchronous parallelization feature, which fully exploits the computational resource on HPC within a computational budget. As a result, the proposed Scalable3-BO framework is scalable in three independent perspectives: with respect to data size, dimensionality, and computational resource on HPC. The goal of this work is to push the frontiers of BO beyond its well-known scalability issues and minimize the wall-clock waiting time for optimizing high-dimensional computationally expensive applications. We demonstrate the capability of Scalable3-BO with 1 million data points, 10,000-dimensional problems, with 20 concurrent workers in an HPC environment.


Electronics ◽  
2021 ◽  
Vol 10 (16) ◽  
pp. 1973
Author(s):  
Daniel S. Soper

Selecting a final machine learning (ML) model typically occurs after a process of hyperparameter optimization in which many candidate models with varying structural properties and algorithmic settings are evaluated and compared. Evaluating each candidate model commonly relies on k-fold cross validation, wherein the data are randomly subdivided into k folds, with each fold being iteratively used as a validation set for a model that has been trained using the remaining folds. While many research studies have sought to accelerate ML model selection by applying metaheuristic and other search methods to the hyperparameter space, no consideration has been given to the k-fold cross validation process itself as a means of rapidly identifying the best-performing model. The current study rectifies this oversight by introducing a greedy k-fold cross validation method and demonstrating that greedy k-fold cross validation can vastly reduce the average time required to identify the best-performing model when given a fixed computational budget and a set of candidate models. This improved search time is shown to hold across a variety of ML algorithms and real-world datasets. For scenarios without a computational budget, this paper also introduces an early stopping algorithm based on the greedy cross validation method. The greedy early stopping method is shown to outperform a competing, state-of-the-art early stopping method both in terms of search time and the quality of the ML models selected by the algorithm. Since hyperparameter optimization is among the most time-consuming, computationally intensive, and monetarily expensive tasks in the broader process of developing ML-based solutions, the ability to rapidly identify optimal machine learning models using greedy cross validation has obvious and substantial benefits to organizations and researchers alike.


2021 ◽  
Vol 31 (4) ◽  
pp. 1-36
Author(s):  
Ran Yang ◽  
David Kent ◽  
Daniel W. Apley ◽  
Jeremy Staum ◽  
David Ruppert

Many two-level nested simulation applications involve the conditional expectation of some response variable, where the expected response is the quantity of interest, and the expectation is with respect to the inner-level random variables, conditioned on the outer-level random variables. The latter typically represent random risk factors, and risk can be quantified by estimating the probability density function (pdf) or cumulative distribution function (cdf) of the conditional expectation. Much prior work has considered a naïve estimator that uses the empirical distribution of the sample averages across the inner-level replicates. This results in a biased estimator, because the distribution of the sample averages is over-dispersed relative to the distribution of the conditional expectation when the number of inner-level replicates is finite. Whereas most prior work has focused on allocating the numbers of outer- and inner-level replicates to balance the bias/variance tradeoff, we develop a bias-corrected pdf estimator. Our approach is based on the concept of density deconvolution, which is widely used to estimate densities with noisy observations but has not previously been considered for nested simulation problems. For a fixed computational budget, the bias-corrected deconvolution estimator allows more outer-level and fewer inner-level replicates to be used, which substantially improves the efficiency of the nested simulation.


Author(s):  
Rajshekhar Singhania ◽  
Chinmay Sawkar ◽  
Manoj K. Tiwari

Abstract In this article, the problem of optimal sensor deployment in large-scale manufacturing systems for effective process monitoring is solved using a variant of the ant colony system (ACS) algorithm to obtain an optimal number of sensors, their types, and locations for monitoring various possible faults. For this purpose, first, we define the need for optimizing sensor deployment in large-scale manufacturing processes because of the increasing application of Wireless Sensor Networks (WSNs) as an architectural framework for Machine-to-Machine (M2M) communications and Cyber-Physical Systems (CPS). Then a multi-objective formulation of optimal sensor deployment in a Single Station Multi-Step Manufacturing Process concerning sensor costs, system reliability, and stability is briefly explained. As noted earlier by several authors, the sensor deployment problem is a set covering problem. It is known that metaheuristics like genetic algorithms, variants of ant colony algorithms, etc. are not efficient in finding a near-optimal solution in less computational budget to the large-scale set covering problems. For this purpose, a Convergence Trajectory Controlled ant colony system is developed and applied on a case study of an automated assembly robot. For an effective demonstration of the convergence power of the developed algorithm, we also apply our algorithm on some large-scale benchmark datasets of the set covering problem and compare the results obtained with the ant colony system algorithm. The results obtained show that the developed algorithm can give a near-optimal solution in less computational budget than the ACS algorithm.


Author(s):  
Haibo Yu ◽  
Li Kang ◽  
Ying Tan ◽  
Jianchao Zeng ◽  
Chaoli Sun

AbstractSurrogate models are commonly used to reduce the number of required expensive fitness evaluations in optimizing computationally expensive problems. Although many competitive surrogate-assisted evolutionary algorithms have been proposed, it remains a challenging issue to develop an effective model management strategy to address problems with different landscape features under a limited computational budget. This paper adopts a coarse-to-fine evaluation scheme basing on two surrogate models, i.e., a coarse Gaussian process and a fine radial basis function, for assisting a differential evolution algorithm to solve computationally expensive optimization problems. The coarse Gaussian process model is meant to capture the general contour of the fitness landscape to estimate the fitness and its degree of uncertainty. A surrogate-assisted environmental selection strategy is then developed according to the non-dominance relationship between approximated fitness and estimated uncertainty. Meanwhile, the fine radial basis function model aims to learn the details of the local fitness landscape to refine the approximation quality of the new parent population and find the local optima for real-evaluations. The performance and scalability of the proposed method are extensively evaluated on two sets of widely used benchmark problems. Experimental results show that the proposed method can outperform several state-of-the-art algorithms within a limited computational budget.


Sign in / Sign up

Export Citation Format

Share Document