An Automated Machine Learning-Genetic Algorithm Framework With Active Learning for Design Optimization

Opeoluwa Owoyele; Pinaki Pal; Alvaro Vidal Torreira

doi:10.1115/1.4050489

An Automated Machine Learning-Genetic Algorithm Framework With Active Learning for Design Optimization

Journal of Energy Resources Technology ◽

10.1115/1.4050489 ◽

2021 ◽

Vol 143 (8) ◽

Author(s):

Opeoluwa Owoyele ◽

Pinaki Pal ◽

Alvaro Vidal Torreira

Keyword(s):

Machine Learning ◽

Active Learning ◽

Design Optimization ◽

Fuel Injection ◽

A Priori ◽

Computational Cost ◽

Learning Approach ◽

Data Generation ◽

Compression Ignition Engine ◽

Hyperparameter Selection

AbstractThe use of machine learning (ML)-based surrogate models is a promising technique to significantly accelerate simulation-driven design optimization of internal combustion (IC) engines, due to the high computational cost of running computational fluid dynamics (CFD) simulations. However, training the ML models requires hyperparameter selection, which is often done using trial-and-error and domain expertise. Another challenge is that the data required to train these models are often unknown a priori. In this work, we present an automated hyperparameter selection technique coupled with an active learning approach to address these challenges. The technique presented in this study involves the use of a Bayesian approach to optimize the hyperparameters of the base learners that make up a super learner model. In addition to performing hyperparameter optimization (HPO), an active learning approach is employed, where the process of data generation using simulations, ML training, and surrogate optimization is performed repeatedly to refine the solution in the vicinity of the predicted optimum. The proposed approach is applied to the optimization of a compression ignition engine with control parameters relating to fuel injection, in-cylinder flow, and thermodynamic conditions. It is demonstrated that by automatically selecting the best values of the hyperparameters, a 1.6% improvement in merit value is obtained, compared to an improvement of 1.0% with default hyperparameters. Overall, the framework introduced in this study reduces the need for technical expertise in training ML models for optimization while also reducing the number of simulations needed for performing surrogate-based design optimization.

Download Full-text

An Automated Machine Learning-Genetic Algorithm (AutoML-GA) Framework With Active Learning for Design Optimization

ASME 2020 Internal Combustion Engine Division Fall Technical Conference ◽

10.1115/icef2020-3000 ◽

2020 ◽

Author(s):

Opeoluwa Owoyele ◽

Pinaki Pal ◽

Alvaro Vidal Torreira

Keyword(s):

Machine Learning ◽

Active Learning ◽

Design Optimization ◽

Test Performance ◽

Computational Cost ◽

Learning Approach ◽

Data Generation ◽

Compression Ignition Engine ◽

Ic Engine ◽

Hyperparameter Selection

Abstract The use of machine learning (ML) based surrogate models is a promising technique to significantly accelerate simulation-based design optimization of IC engines, due to the high computational cost of running computational fluid dynamics (CFD) simulations. However, surrogate-based optimization for IC engine applications suffers from two main issues. First, training ML models requires hyperparameter selection, often involving trial-and-error combined with domain expertise. The second issue is that the data required to train these models is often unknown a priori. In this work, we present an automated hyperparameter selection technique coupled with an active learning approach to address these challenges. The technique presented in this study involves the use of a Bayesian approach to optimize the hyperparameters of the base learners that make up a Super Learner model to obtain better test performance. In addition to performing hyperparameter optimization (HPO), an active learning approach is employed, where the process of data generation using simulations, ML training, and surrogate optimization, is performed repeatedly to refine the solution in the vicinity of the predicted optimum. The proposed approach is applied to the optimization of a compression ignition engine with control parameters relating to fuel injection, in-cylinder flow, and thermodynamic conditions. It is demonstrated that by automatically selecting the best values of the hyperparameters, a 1.6% improvement in merit value is obtained, compared to an improvement of 1.0% with default hyperparameters. Overall, the framework introduced in this study reduces the need for technical expertise in training ML models for optimization, while also reducing the number of simulations needed for performing surrogate-based design optimization.

Download Full-text

Building attention and edge message passing neural networks for bioactivity and physical–chemical property prediction

Journal of Cheminformatics ◽

10.1186/s13321-019-0407-y ◽

2020 ◽

Vol 12 (1) ◽

Cited By ~ 9

Author(s):

M. Withnall ◽

E. Lindelöf ◽

O. Engkvist ◽

H. Chen

Keyword(s):

Machine Learning ◽

Message Passing ◽

A Priori ◽

Molecular Graph ◽

Model Performance ◽

Learning Approaches ◽

Property Prediction ◽

Physical Chemical ◽

Chemical Descriptor ◽

Hyperparameter Selection

AbstractNeural Message Passing for graphs is a promising and relatively recent approach for applying Machine Learning to networked data. As molecules can be described intrinsically as a molecular graph, it makes sense to apply these techniques to improve molecular property prediction in the field of cheminformatics. We introduce Attention and Edge Memory schemes to the existing message passing neural network framework, and benchmark our approaches against eight different physical–chemical and bioactivity datasets from the literature. We remove the need to introduce a priori knowledge of the task and chemical descriptor calculation by using only fundamental graph-derived properties. Our results consistently perform on-par with other state-of-the-art machine learning approaches, and set a new standard on sparse multi-task virtual screening targets. We also investigate model performance as a function of dataset preprocessing, and make some suggestions regarding hyperparameter selection.

Download Full-text

A Cooperative Machine Learning Approach for Pedestrian Navigation in Indoor IoT

Sensors ◽

10.3390/s19214609 ◽

2019 ◽

Vol 19 (21) ◽

pp. 4609 ◽

Cited By ~ 1

Author(s):

Marzieh Jalal Abadi ◽

Luca Luceri ◽

Mahbub Hassan ◽

Chun Tung Chou ◽

Monica Nicoli

Keyword(s):

Machine Learning ◽

A Priori ◽

Dead Reckoning ◽

Step Length ◽

Consensus Algorithm ◽

Learning Approach ◽

Indoor Environments ◽

Machine Learning Approach ◽

Multiple Devices ◽

Priori Information

This paper presents a system based on pedestrian dead reckoning (PDR) for localization of networked mobile users, which relies only on sensors embedded in the devices and device- to-device connectivity. The user trajectory is reconstructed by measuring step by step the user displacements. Though step length can be estimated rather accurately, heading evaluation is extremely problematic in indoor environments. Magnetometer is typically used, however measurements are strongly perturbed. To improve the location accuracy, this paper proposes a novel cooperative system to estimate the direction of motion based on a machine learning approach for perturbation detection and filtering, combined with a consensus algorithm for performance augmentation by cooperative data fusion at multiple devices. A first algorithm filters out perturbed magnetometer measurements based on a-priori information on the Earth’s magnetic field. A second algorithm aggregates groups of users walking in the same direction, while a third one combines the measurements of the aggregated users in a distributed way to extract a more accurate heading estimate. To the best of our knowledge, this is the first approach that combines machine learning with consensus algorithms for cooperative PDR. Compared to other methods in the literature, the method has the advantage of being infrastructure-free, fully distributed and robust to sensor failures thanks to the pre-filtering of perturbed measurements. Extensive indoor experiments show that the heading error is highly reduced by the proposed approach thus leading to noticeable enhancements in localization performance.

Download Full-text

An Automated Machine Learning-Genetic Algorithm (AutoML-GA) Framework With Active Learning for Design Optimization

10.1115/1.0003772v ◽

2021 ◽

Author(s):

Opeoluwa Owoyele ◽

Pinaki Pal ◽

Alvaro Vidal Torreira

Keyword(s):

Machine Learning ◽

Genetic Algorithm ◽

Active Learning ◽

Design Optimization ◽

Automated Machine Learning

Download Full-text

A Machine Learning Approach to Test Data Generation: A Case Study in Evaluation of Gene Finders

Machine Learning and Data Mining in Pattern Recognition - Lecture Notes in Computer Science ◽

10.1007/978-3-540-73499-4_56 ◽

2007 ◽

pp. 742-755 ◽

Cited By ~ 3

Author(s):

Henning Christiansen ◽

Christina Mackeprang Dahmcke

Keyword(s):

Machine Learning ◽

Test Data ◽

Learning Approach ◽

Test Data Generation ◽

Data Generation ◽

Machine Learning Approach

Download Full-text

The relevant range of scales for multi-scale contextual spatial modelling

Scientific Reports ◽

10.1038/s41598-019-51395-3 ◽

2019 ◽

Vol 9 (1) ◽

Cited By ~ 1

Author(s):

Thorsten Behrens ◽

Raphael A. Viscarra Rossel ◽

Ruth Kerry ◽

Robert MacMillan ◽

Karsten Schmidt ◽

...

Keyword(s):

Machine Learning ◽

Spatial Autocorrelation ◽

Multiple Scales ◽

A Priori ◽

Computational Cost ◽

Scale Space ◽

Spatial Modelling ◽

Environmental Models ◽

Covariate Information ◽

Relevant Range

Abstract Spatial autocorrelation in the residuals of spatial environmental models can be due to missing covariate information. In many cases, this spatial autocorrelation can be accounted for by using covariates from multiple scales. Here, we propose a data-driven, objective and systematic method for deriving the relevant range of scales, with distinct upper and lower scale limits, for spatial modelling with machine learning and evaluated its effect on modelling accuracy. We also tested an approach that uses the variogram to see whether such an effective scale space can be approximated a priori and at smaller computational cost. Results showed that modelling with an effective scale space can improve spatial modelling with machine learning and that there is a strong correlation between properties of the variogram and the relevant range of scales. Hence, the variogram of a soil property can be used for a priori approximations of the effective scale space for contextual spatial modelling and is therefore an important analytical tool not only in geostatistics, but also for analyzing structural dependencies in contextual spatial modelling.

Download Full-text

Weak lensing shear estimation beyond the shape-noise limit: a machine learning approach

Monthly Notices of the Royal Astronomical Society ◽

10.1093/mnras/stz2991 ◽

2019 ◽

Cited By ~ 1

Author(s):

Ofer M Springer ◽

Eran O Ofek ◽

Yair Weiss ◽

Julian Merten

Keyword(s):

Machine Learning ◽

Galaxy Clusters ◽

Estimation Error ◽

A Priori ◽

Galaxy Cluster ◽

Weak Lensing ◽

Learning Approach ◽

Initial Attempt ◽

Statistical Errors ◽

Machine Learning Approach

Abstract Weak lensing shear estimation typically results in per galaxy statistical errors significantly larger than the sought after gravitational signal of only a few percent. These statistical errors are mostly a result of shape-noise — an estimation error due to the diverse (and a-priori unknown) morphology of individual background galaxies. These errors are inversely proportional to the limiting angular resolution at which localized objects, such as galaxy clusters, can be probed with weak lensing shear. In this work we report on our initial attempt to reduce statistical errors in weak lensing shear estimation using a machine learning approach — training a multi-layered convolutional neural network to directly estimate the shear given an observed background galaxy image. We train, calibrate and evaluate the performance and stability of our estimator using simulated galaxy images designed to mimic the distribution of HST observations of lensed background sources in the CLASH galaxy cluster survey. Using the trained estimator, we produce weak lensing shear maps of the cores of 20 galaxy clusters in the CLASH survey, demonstrating an RMS scatter reduced by approximately 26% when compared to maps produced with a commonly used shape estimator. This is equivalent to a survey speed enhancement of approximately 60%. However, given the non-transparent nature of the machine learning approach, this result requires further testing and validation. We provide python code to train and test this estimator on both simulated and real galaxy cluster observations. We also provide updated weak lensing catalogues for the 20 CLASH galaxy clusters studied.

Download Full-text

AutoDAL: Distributed Active Learning with Automatic Hyperparameter Selection

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i04.5759 ◽

2020 ◽

Vol 34 (04) ◽

pp. 3537-3544

Author(s):

Xu Chen ◽

Brett Wujek

Keyword(s):

Machine Learning ◽

Active Learning ◽

Supervised Learning ◽

Learning Algorithm ◽

Learning Algorithms ◽

Learning System ◽

Automated Learning ◽

Benchmark Datasets ◽

Hyperparameter Selection ◽

Query Selection

Automated machine learning (AutoML) strives to establish an appropriate machine learning model for any dataset automatically with minimal human intervention. Although extensive research has been conducted on AutoML, most of it has focused on supervised learning. Research of automated semi-supervised learning and active learning algorithms is still limited. Implementation becomes more challenging when the algorithm is designed for a distributed computing environment. With this as motivation, we propose a novel automated learning system for distributed active learning (AutoDAL) to address these challenges. First, automated graph-based semi-supervised learning is conducted by aggregating the proposed cost functions from different compute nodes in a distributed manner. Subsequently, automated active learning is addressed by jointly optimizing hyperparameters in both the classification and query selection stages leveraging the graph loss minimization and entropy regularization. Moreover, we propose an efficient distributed active learning algorithm which is scalable for big data by first partitioning the unlabeled data and replicating the labeled data to different worker nodes in the classification stage, and then aggregating the data in the controller in the query selection stage. The proposed AutoDAL algorithm is applied to multiple benchmark datasets and a real-world electrocardiogram (ECG) dataset for classification. We demonstrate that the proposed AutoDAL algorithm is capable of achieving significantly better performance compared to several state-of-the-art AutoML approaches and active learning algorithms.

Download Full-text

A machine learning approach to support deep brain stimulation programming

Revista Facultad de Ingeniería Universidad de Antioquia ◽

10.17533/udea.redin.20190729 ◽

2019 ◽

pp. 20-33

Author(s):

Viviana Gómez-Orozco ◽

Iván De La Pava Panche ◽

Andrés Marino Álvarez-Meza ◽

Mauricio Alexander Álvarez-López ◽

Álvaro Ángel Orozco-Gutiérrez

Keyword(s):

Machine Learning ◽

Deep Brain Stimulation ◽

Brain Stimulation ◽

Computational Cost ◽

Parameter Tuning ◽

Learning Approach ◽

Vast Number ◽

Stimulation Parameters ◽

Machine Learning Approach ◽

Deep Brain

Adjusting the stimulation parameters is a challenge in deep brain stimulation (DBS) therapy due to the vast number of different configurations available. As a result, systems based on the visualization of the volume of tissue activated (VTA) produced by a particular stimulation setting have been developed. However, the medical specialist still has to search, by trial and error, for a DBS set-up that generates the desired VTA. Therefore, our goal is developing a DBS parameter tuning strategy for current clinical devices that allows defining a target VTA under biophysically viable constraints. We propose a machine learning approach that allows estimating the DBS parameter values for a given VTA, which comprises two main stages: i) A K-nearest neighbors-based deformation to define a target VTA preserving biophysically viable constraints. ii) A parameter estimation stage that consists of a data projection using metric learning to highlight relevant VTA properties, and a regression/classification algorithm to estimate the DBS parameters that generate the target VTA. Our methodology allows setting a biophysically compliant target VTA and accurately predicts the required configuration of stimulation parameters. Also, the performance of our approach is stable for both isotropic and anisotropic tissue conductivities. Furthermore, the computational cost of the trained system is acceptable for real-world implementations.

Download Full-text

Machine-learning model led design to experimentally test species thermal limits: the case of kissing bugs (Triatominae)

10.1101/2020.10.05.326017 ◽

2020 ◽

Author(s):

Jorge E. Rabinovich ◽

Agustín Alvarez Costa ◽

Ignacio Muñoz ◽

Pablo E. Schilman ◽

Nicholas Fountain-Jones

Keyword(s):

Machine Learning ◽

Active Learning ◽

Exposure Time ◽

Thermal Tolerance ◽

Life Stage ◽

Species Distribution Modelling ◽

Climatic Variables ◽

Minor Role ◽

Learning Approach ◽

Thermal Limits

AbstractSpecies Distribution Modelling (SDM) determines habitat suitability of a species across geographic areas using macro-climatic variables; however, micro-habitats can buffer or exacerbate the influence of macro-climatic variables, requiring links between physiology and species persistence. Experimental approaches linking species physiology to micro-climate are complex, time consuming and expensive. E.g., what combination of exposure time and temperature is important for a species thermal tolerance is difficult to judge a priori. We tackled this problem using an active learning approach that utilized machine learning methods to guide thermal tolerance experimental design for three kissing-bug species (Hemiptera: Reduviidae: Triatominae), vectors of the parasite causing Chagas disease. As with other pathogen vectors, triatomines are well known to utilize micro-habitats and the associated shift in microclimate to enhance survival. Using a limited literature-collected dataset, our approach showed that temperature followed by exposure time were the strongest predictors of mortality; species played a minor role, and life stage was the least important. Further, we identified complex but biologically plausible nonlinear interactions between temperature and exposure time in shaping mortality, together setting the potential thermal limits of triatomines. The results from this data led to the design of new experiments with laboratory results that produced novel insights of the effects of temperature and exposure for the triatomines. These results, in turn, can be used to better model micro-climatic envelope for the species. Here we demonstrate the power of an active learning approach to explore experimental space to design laboratory studies testing species thermal limits. Our analytical pipeline can be easily adapted to other systems and we provide code to allow practitioners to perform similar analyses. Not only does our approach have the potential to save time and money: it can also increase our understanding of the links between species physiology and climate, a topic of increasing ecological importance.Author summarySpecies Distribution Modelling determines habitat suitability of a species across geographic areas using macro-climatic variables; however, micro-habitats can buffer or exacerbate the influence of macro-climatic variables, requiring links between physiology and species persistence. We tackled the problem of the combination of exposure time and temperature (a combination difficult to judge a priori) in determining species thermal tolerance, using an active learning approach that utilized machine learning methods to guide thermal tolerance experimental design for three kissing-bug species, vectors of the parasite causing Chagas disease. These bugs are found in micro-habitats with associated shifts in microclimate to enhance survival. Using a limited literature-collected dataset, we showed that temperature followed by exposure time were the strongest predictors of mortality, that species played a minor role, that life stage was the least important, and a complex nonlinear interaction between temperature and exposure time in shaping mortality of kissing bugs. These results led to the design of new laboratory experiments to assess the effects of temperature and exposure for the triatomines. These results can be used to better model micro-climatic envelope for species. Our active learning approach to explore experimental space to design laboratory studies can also be applied to other environmental conditions or species.

Download Full-text