An Automated Machine Learning-Genetic Algorithm (AutoML-GA) Framework With Active Learning for Design Optimization

Author(s):  
Opeoluwa Owoyele ◽  
Pinaki Pal ◽  
Alvaro Vidal Torreira

Abstract The use of machine learning (ML) based surrogate models is a promising technique to significantly accelerate simulation-based design optimization of IC engines, due to the high computational cost of running computational fluid dynamics (CFD) simulations. However, surrogate-based optimization for IC engine applications suffers from two main issues. First, training ML models requires hyperparameter selection, often involving trial-and-error combined with domain expertise. The second issue is that the data required to train these models is often unknown a priori. In this work, we present an automated hyperparameter selection technique coupled with an active learning approach to address these challenges. The technique presented in this study involves the use of a Bayesian approach to optimize the hyperparameters of the base learners that make up a Super Learner model to obtain better test performance. In addition to performing hyperparameter optimization (HPO), an active learning approach is employed, where the process of data generation using simulations, ML training, and surrogate optimization, is performed repeatedly to refine the solution in the vicinity of the predicted optimum. The proposed approach is applied to the optimization of a compression ignition engine with control parameters relating to fuel injection, in-cylinder flow, and thermodynamic conditions. It is demonstrated that by automatically selecting the best values of the hyperparameters, a 1.6% improvement in merit value is obtained, compared to an improvement of 1.0% with default hyperparameters. Overall, the framework introduced in this study reduces the need for technical expertise in training ML models for optimization, while also reducing the number of simulations needed for performing surrogate-based design optimization.

2021 ◽  
Vol 143 (8) ◽  
Author(s):  
Opeoluwa Owoyele ◽  
Pinaki Pal ◽  
Alvaro Vidal Torreira

AbstractThe use of machine learning (ML)-based surrogate models is a promising technique to significantly accelerate simulation-driven design optimization of internal combustion (IC) engines, due to the high computational cost of running computational fluid dynamics (CFD) simulations. However, training the ML models requires hyperparameter selection, which is often done using trial-and-error and domain expertise. Another challenge is that the data required to train these models are often unknown a priori. In this work, we present an automated hyperparameter selection technique coupled with an active learning approach to address these challenges. The technique presented in this study involves the use of a Bayesian approach to optimize the hyperparameters of the base learners that make up a super learner model. In addition to performing hyperparameter optimization (HPO), an active learning approach is employed, where the process of data generation using simulations, ML training, and surrogate optimization is performed repeatedly to refine the solution in the vicinity of the predicted optimum. The proposed approach is applied to the optimization of a compression ignition engine with control parameters relating to fuel injection, in-cylinder flow, and thermodynamic conditions. It is demonstrated that by automatically selecting the best values of the hyperparameters, a 1.6% improvement in merit value is obtained, compared to an improvement of 1.0% with default hyperparameters. Overall, the framework introduced in this study reduces the need for technical expertise in training ML models for optimization while also reducing the number of simulations needed for performing surrogate-based design optimization.


Author(s):  
Opeoluwa Owoyele ◽  
Pinaki Pal

Abstract In this work, a novel design optimization technique based on active learning, which involves dynamic exploration and exploitation of the design space of interest using an ensemble of machine learning algorithms, is presented. In this approach, a hybrid methodology incorporating an explorative weak learner (regularized basis function model) which fits high-level information about the response surface, and an exploitative strong learner (based on committee machine) that fits finer details around promising regions identified by the weak learner, is employed. For each design iteration, an aristocratic approach is used to select a set of nominees, where points that meet a threshold merit value as predicted by the weak learner are selected to be evaluated using expensive function evaluation. In addition to these points, the global optimum as predicted by the strong learner is also evaluated to enable rapid convergence to the actual global optimum once the most promising region has been identified by the optimizer. This methodology is first tested by applying it to the optimization of a two-dimensional multi-modal surface. The performance of the new active learning approach is compared with traditional global optimization methods, namely micro-genetic algorithm (μGA) and particle swarm optimization (PSO). It is demonstrated that the new optimizer is able to reach the global optimum much faster, with a significantly fewer number of function evaluations. Subsequently, the new optimizer is also applied to a complex internal combustion (IC) engine combustion optimization case with nine control parameters related to fuel injection, initial thermodynamic conditions, and in-cylinder flow. It is again found that the new approach significantly lowers the number of function evaluations that are needed to reach the optimum design configuration (by up to 80%) when compared to particle swarm and genetic algorithm-based optimization techniques.


2020 ◽  
Vol 34 (04) ◽  
pp. 3537-3544
Author(s):  
Xu Chen ◽  
Brett Wujek

Automated machine learning (AutoML) strives to establish an appropriate machine learning model for any dataset automatically with minimal human intervention. Although extensive research has been conducted on AutoML, most of it has focused on supervised learning. Research of automated semi-supervised learning and active learning algorithms is still limited. Implementation becomes more challenging when the algorithm is designed for a distributed computing environment. With this as motivation, we propose a novel automated learning system for distributed active learning (AutoDAL) to address these challenges. First, automated graph-based semi-supervised learning is conducted by aggregating the proposed cost functions from different compute nodes in a distributed manner. Subsequently, automated active learning is addressed by jointly optimizing hyperparameters in both the classification and query selection stages leveraging the graph loss minimization and entropy regularization. Moreover, we propose an efficient distributed active learning algorithm which is scalable for big data by first partitioning the unlabeled data and replicating the labeled data to different worker nodes in the classification stage, and then aggregating the data in the controller in the query selection stage. The proposed AutoDAL algorithm is applied to multiple benchmark datasets and a real-world electrocardiogram (ECG) dataset for classification. We demonstrate that the proposed AutoDAL algorithm is capable of achieving significantly better performance compared to several state-of-the-art AutoML approaches and active learning algorithms.


Author(s):  
Viviana Gómez-Orozco ◽  
Iván De La Pava Panche ◽  
Andrés Marino Álvarez-Meza ◽  
Mauricio Alexander Álvarez-López ◽  
Álvaro Ángel Orozco-Gutiérrez

Adjusting the stimulation parameters is a challenge in deep brain stimulation (DBS) therapy due to the vast number of different configurations available. As a result, systems based on the visualization of the volume of tissue activated (VTA) produced by a particular stimulation setting have been developed. However, the medical specialist still has to search, by trial and error, for a DBS set-up that generates the desired VTA. Therefore, our goal is developing a DBS parameter tuning strategy for current clinical devices that allows defining a target VTA under biophysically viable constraints. We propose a machine learning approach that allows estimating the DBS parameter values for a given VTA, which comprises two main stages: i) A K-nearest neighbors-based deformation to define a target VTA preserving biophysically viable constraints. ii) A parameter estimation stage that consists of a data projection using metric learning to highlight relevant VTA properties, and a regression/classification algorithm to estimate the DBS parameters that generate the target VTA. Our methodology allows setting a biophysically compliant target VTA and accurately predicts the required configuration of stimulation parameters. Also, the performance of our approach is stable for both isotropic and anisotropic tissue conductivities. Furthermore, the computational cost of the trained system is acceptable for real-world implementations.


2020 ◽  
Author(s):  
Jorge E. Rabinovich ◽  
Agustín Alvarez Costa ◽  
Ignacio Muñoz ◽  
Pablo E. Schilman ◽  
Nicholas Fountain-Jones

AbstractSpecies Distribution Modelling (SDM) determines habitat suitability of a species across geographic areas using macro-climatic variables; however, micro-habitats can buffer or exacerbate the influence of macro-climatic variables, requiring links between physiology and species persistence. Experimental approaches linking species physiology to micro-climate are complex, time consuming and expensive. E.g., what combination of exposure time and temperature is important for a species thermal tolerance is difficult to judge a priori. We tackled this problem using an active learning approach that utilized machine learning methods to guide thermal tolerance experimental design for three kissing-bug species (Hemiptera: Reduviidae: Triatominae), vectors of the parasite causing Chagas disease. As with other pathogen vectors, triatomines are well known to utilize micro-habitats and the associated shift in microclimate to enhance survival. Using a limited literature-collected dataset, our approach showed that temperature followed by exposure time were the strongest predictors of mortality; species played a minor role, and life stage was the least important. Further, we identified complex but biologically plausible nonlinear interactions between temperature and exposure time in shaping mortality, together setting the potential thermal limits of triatomines. The results from this data led to the design of new experiments with laboratory results that produced novel insights of the effects of temperature and exposure for the triatomines. These results, in turn, can be used to better model micro-climatic envelope for the species. Here we demonstrate the power of an active learning approach to explore experimental space to design laboratory studies testing species thermal limits. Our analytical pipeline can be easily adapted to other systems and we provide code to allow practitioners to perform similar analyses. Not only does our approach have the potential to save time and money: it can also increase our understanding of the links between species physiology and climate, a topic of increasing ecological importance.Author summarySpecies Distribution Modelling determines habitat suitability of a species across geographic areas using macro-climatic variables; however, micro-habitats can buffer or exacerbate the influence of macro-climatic variables, requiring links between physiology and species persistence. We tackled the problem of the combination of exposure time and temperature (a combination difficult to judge a priori) in determining species thermal tolerance, using an active learning approach that utilized machine learning methods to guide thermal tolerance experimental design for three kissing-bug species, vectors of the parasite causing Chagas disease. These bugs are found in micro-habitats with associated shifts in microclimate to enhance survival. Using a limited literature-collected dataset, we showed that temperature followed by exposure time were the strongest predictors of mortality, that species played a minor role, that life stage was the least important, and a complex nonlinear interaction between temperature and exposure time in shaping mortality of kissing bugs. These results led to the design of new laboratory experiments to assess the effects of temperature and exposure for the triatomines. These results can be used to better model micro-climatic envelope for species. Our active learning approach to explore experimental space to design laboratory studies can also be applied to other environmental conditions or species.


2019 ◽  
Vol 116 (9) ◽  
pp. 3401-3406 ◽  
Author(s):  
David M. Wilkins ◽  
Andrea Grisafi ◽  
Yang Yang ◽  
Ka Un Lao ◽  
Robert A. DiStasio ◽  
...  

The molecular dipole polarizability describes the tendency of a molecule to change its dipole moment in response to an applied electric field. This quantity governs key intra- and intermolecular interactions, such as induction and dispersion; plays a vital role in determining the spectroscopic signatures of molecules; and is an essential ingredient in polarizable force fields. Compared with other ground-state properties, an accurate prediction of the molecular polarizability is considerably more difficult, as this response quantity is quite sensitive to the underlying electronic structure description. In this work, we present highly accurate quantum mechanical calculations of the static dipole polarizability tensors of 7,211 small organic molecules computed using linear response coupled cluster singles and doubles theory (LR-CCSD). Using a symmetry-adapted machine-learning approach, we demonstrate that it is possible to predict the LR-CCSD molecular polarizabilities of these small molecules with an error that is an order of magnitude smaller than that of hybrid density functional theory (DFT) at a negligible computational cost. The resultant model is robust and transferable, yielding molecular polarizabilities for a diverse set of 52 larger molecules (including challenging conjugated systems, carbohydrates, small drugs, amino acids, nucleobases, and hydrocarbon isomers) at an accuracy that exceeds that of hybrid DFT. The atom-centered decomposition implicit in our machine-learning approach offers some insight into the shortcomings of DFT in the prediction of this fundamental quantity of interest.


2020 ◽  
Vol 11 (1) ◽  
Author(s):  
Rhys E. A. Goodall ◽  
Alpha A. Lee

AbstractMachine learning has the potential to accelerate materials discovery by accurately predicting materials properties at a low computational cost. However, the model inputs remain a key stumbling block. Current methods typically use descriptors constructed from knowledge of either the full crystal structure — therefore only applicable to materials with already characterised structures — or structure-agnostic fixed-length representations hand-engineered from the stoichiometry. We develop a machine learning approach that takes only the stoichiometry as input and automatically learns appropriate and systematically improvable descriptors from data. Our key insight is to treat the stoichiometric formula as a dense weighted graph between elements. Compared to the state of the art for structure-agnostic methods, our approach achieves lower errors with less data.


Sign in / Sign up

Export Citation Format

Share Document