On Constructing Ensembles for Combinatorial Optimisation

Although the use of ensemble methods in machine-learning is ubiquitous due to their proven ability to outperform their constituent algorithms, ensembles of optimisation algorithms have received relatively little attention. Existing approaches lag behind machine-learning in both theory and practice, with no principled design guidelines available. In this article, we address fundamental questions regarding ensemble composition in optimisation using the domain of bin-packing as an example. In particular, we investigate the trade-off between accuracy and diversity, and whether diversity metrics can be used as a proxy for constructing an ensemble, proposing a number of novel metrics for comparing algorithm diversity. We find that randomly composed ensembles can outperform ensembles of high-performing algorithms under certain conditions and that judicious choice of diversity metric is required to construct good ensembles. The method and findings can be generalised to any metaheuristic ensemble, and lead to better understanding of how to undertake principled ensemble design.

Download Full-text

Predicting Undesired Treatment Outcome in Mental Healthcare: Machine Learning Study (Preprint)

10.2196/preprints.17235 ◽

2019 ◽

Author(s):

Kasper Van Mens ◽

Joran Lokkerbol ◽

Richard Janssen ◽

Robert de Lange ◽

Bea Tiemens

Keyword(s):

Machine Learning ◽

Treatment Outcome ◽

Mental Health Treatment ◽

Mental Healthcare ◽

Machine Learning Algorithms ◽

Gradient Boosting ◽

Trade Off ◽

Trade Offs ◽

Outcome Monitoring ◽

Extreme Gradient Boosting

BACKGROUND It remains a challenge to predict which treatment will work for which patient in mental healthcare. OBJECTIVE In this study we compare machine algorithms to predict during treatment which patients will not benefit from brief mental health treatment and present trade-offs that must be considered before an algorithm can be used in clinical practice. METHODS Using an anonymized dataset containing routine outcome monitoring data from a mental healthcare organization in the Netherlands (n = 2,655), we applied three machine learning algorithms to predict treatment outcome. The algorithms were internally validated with cross-validation on a training sample (n = 1,860) and externally validated on an unseen test sample (n = 795). RESULTS The performance of the three algorithms did not significantly differ on the test set. With a default classification cut-off at 0.5 predicted probability, the extreme gradient boosting algorithm showed the highest positive predictive value (ppv) of 0.71(0.61 – 0.77) with a sensitivity of 0.35 (0.29 – 0.41) and area under the curve of 0.78. A trade-off can be made between ppv and sensitivity by choosing different cut-off probabilities. With a cut-off at 0.63, the ppv increased to 0.87 and the sensitivity dropped to 0.17. With a cut-off of at 0.38, the ppv decreased to 0.61 and the sensitivity increased to 0.57. CONCLUSIONS Machine learning can be used to predict treatment outcomes based on routine monitoring data.This allows practitioners to choose their own trade-off between being selective and more certain versus inclusive and less certain.

Download Full-text

Statistical and machine learning models for optimizing energy in parallel applications

The International Journal of High Performance Computing Applications ◽

10.1177/1094342019842915 ◽

2019 ◽

Vol 33 (6) ◽

pp. 1079-1097 ◽

Cited By ~ 2

Author(s):

Mark Endrei ◽

Chao Jin ◽

Minh Ngoc Dinh ◽

David Abramson ◽

Heidi Poxon ◽

...

Keyword(s):

Machine Learning ◽

Energy Efficiency ◽

High Performance ◽

Large Scale ◽

Energy Use ◽

Parallel Applications ◽

Learning Models ◽

Trade Off ◽

Time Required ◽

Machine Learning Models

Rising power costs and constraints are driving a growing focus on the energy efficiency of high performance computing systems. The unique characteristics of a particular system and workload and their effect on performance and energy efficiency are typically difficult for application users to assess and to control. Settings for optimum performance and energy efficiency can also diverge, so we need to identify trade-off options that guide a suitable balance between energy use and performance. We present statistical and machine learning models that only require a small number of runs to make accurate Pareto-optimal trade-off predictions using parameters that users can control. We study model training and validation using several parallel kernels and more complex workloads, including Algebraic Multigrid (AMG), Large-scale Atomic Molecular Massively Parallel Simulator, and Livermore Unstructured Lagrangian Explicit Shock Hydrodynamics. We demonstrate that we can train the models using as few as 12 runs, with prediction error of less than 10%. Our AMG results identify trade-off options that provide up to 45% improvement in energy efficiency for around 10% performance loss. We reduce the sample measurement time required for AMG by 90%, from 13 h to 74 min.

Download Full-text

Trade-off Predictivity and Explainability for Machine-Learning Powered Predictive Toxicology: An in-Depth Investigation with Tox21 Data Sets

Chemical Research in Toxicology ◽

10.1021/acs.chemrestox.0c00373 ◽

2021 ◽

Vol 34 (2) ◽

pp. 541-549 ◽

Cited By ~ 1

Author(s):

Leihong Wu ◽

Ruili Huang ◽

Igor V. Tetko ◽

Zhonghua Xia ◽

Joshua Xu ◽

...

Keyword(s):

Machine Learning ◽

Data Sets ◽

Predictive Toxicology ◽

Trade Off

Download Full-text

Comparison of Ensemble Machine Learning Methods for Soil Erosion Pin Measurements

ISPRS International Journal of Geo-Information ◽

10.3390/ijgi10010042 ◽

2021 ◽

Vol 10 (1) ◽

pp. 42

Author(s):

Kieu Anh Nguyen ◽

Walter Chen ◽

Bor-Shiun Lin ◽

Uma Seeboonruang

Keyword(s):

Machine Learning ◽

Soil Erosion ◽

Ensemble Methods ◽

Machine Learning Algorithms ◽

Multivariate Adaptive Regression Splines ◽

Gradient Boosting ◽

Support Vector ◽

Ensemble Machine Learning ◽

Boosting Method ◽

Bagging Method

Although machine learning has been extensively used in various fields, it has only recently been applied to soil erosion pin modeling. To improve upon previous methods of quantifying soil erosion based on erosion pin measurements, this study explored the possible application of ensemble machine learning algorithms to the Shihmen Reservoir watershed in northern Taiwan. Three categories of ensemble methods were considered in this study: (a) Bagging, (b) boosting, and (c) stacking. The bagging method in this study refers to bagged multivariate adaptive regression splines (bagged MARS) and random forest (RF), and the boosting method includes Cubist and gradient boosting machine (GBM). Finally, the stacking method is an ensemble method that uses a meta-model to combine the predictions of base models. This study used RF and GBM as the meta-models, decision tree, linear regression, artificial neural network, and support vector machine as the base models. The dataset used in this study was sampled using stratified random sampling to achieve a 70/30 split for the training and test data, and the process was repeated three times. The performance of six ensemble methods in three categories was analyzed based on the average of three attempts. It was found that GBM performed the best among the ensemble models with the lowest root-mean-square error (RMSE = 1.72 mm/year), the highest Nash-Sutcliffe efficiency (NSE = 0.54), and the highest index of agreement (d = 0.81). This result was confirmed by the spatial comparison of the absolute differences (errors) between model predictions and observations using GBM and RF in the study area. In summary, the results show that as a group, the bagging method and the boosting method performed equally well, and the stacking method was third for the erosion pin dataset considered in this study.

Download Full-text

TPOT-NN: augmenting tree-based automated machine learning with neural network estimators

Genetic Programming and Evolvable Machines ◽

10.1007/s10710-021-09401-z ◽

2021 ◽

Author(s):

Joseph D. Romano ◽

Trang T. Le ◽

Weixuan Fu ◽

Jason H. Moore

Keyword(s):

Neural Network ◽

Machine Learning ◽

Binary Classification ◽

Inductive Learning ◽

Future Directions ◽

High Performing ◽

Learning Tasks ◽

Benchmark Datasets ◽

Automated Machine Learning ◽

Standard Tree

AbstractAutomated machine learning (AutoML) and artificial neural networks (ANNs) have revolutionized the field of artificial intelligence by yielding incredibly high-performing models to solve a myriad of inductive learning tasks. In spite of their successes, little guidance exists on when to use one versus the other. Furthermore, relatively few tools exist that allow the integration of both AutoML and ANNs in the same analysis to yield results combining both of their strengths. Here, we present TPOT-NN—a new extension to the tree-based AutoML software TPOT—and use it to explore the behavior of automated machine learning augmented with neural network estimators (AutoML+NN), particularly when compared to non-NN AutoML in the context of simple binary classification on a number of public benchmark datasets. Our observations suggest that TPOT-NN is an effective tool that achieves greater classification accuracy than standard tree-based AutoML on some datasets, with no loss in accuracy on others. We also provide preliminary guidelines for performing AutoML+NN analyses, and recommend possible future directions for AutoML+NN methods research, especially in the context of TPOT.

Download Full-text

Machine learning applied to simulations of collisions between rotating, differentiated planets

Computational Astrophysics and Cosmology ◽

10.1186/s40668-020-00034-6 ◽

2020 ◽

Vol 7 (1) ◽

Author(s):

Miles L. Timpe ◽

Maria Han Veiga ◽

Mischa Knabenhans ◽

Joachim Stadel ◽

Stefano Marelli

Keyword(s):

Machine Learning ◽

Planet Formation ◽

Critical Role ◽

Ensemble Methods ◽

Terrestrial Planet ◽

Data Driven ◽

Analytic Methods ◽

Accurate Treatment ◽

Post Impact ◽

The Cost

AbstractIn the late stages of terrestrial planet formation, pairwise collisions between planetary-sized bodies act as the fundamental agent of planet growth. These collisions can lead to either growth or disruption of the bodies involved and are largely responsible for shaping the final characteristics of the planets. Despite their critical role in planet formation, an accurate treatment of collisions has yet to be realized. While semi-analytic methods have been proposed, they remain limited to a narrow set of post-impact properties and have only achieved relatively low accuracies. However, the rise of machine learning and access to increased computing power have enabled novel data-driven approaches. In this work, we show that data-driven emulation techniques are capable of classifying and predicting the outcome of collisions with high accuracy and are generalizable to any quantifiable post-impact quantity. In particular, we focus on the dataset requirements, training pipeline, and classification and regression performance for four distinct data-driven techniques from machine learning (ensemble methods and neural networks) and uncertainty quantification (Gaussian processes and polynomial chaos expansion). We compare these methods to existing analytic and semi-analytic methods. Such data-driven emulators are poised to replace the methods currently used in N-body simulations, while avoiding the cost of direct simulation. This work is based on a new set of 14,856 SPH simulations of pairwise collisions between rotating, differentiated bodies at all possible mutual orientations.

Download Full-text

Evolutionary Machine Learning for Multi-Objective Class Solutions in Medical Deformable Image Registration

Algorithms ◽

10.3390/a12050099 ◽

2019 ◽

Vol 12 (5) ◽

pp. 99 ◽

Cited By ~ 2

Author(s):

Kleopatra Pirpinia ◽

Peter A. N. Bosman ◽

Jan-Jakob Sonke ◽

Marcel van Herk ◽

Tanja Alderliesten

Keyword(s):

Machine Learning ◽

Image Registration ◽

State Of The Art ◽

Deformable Image Registration ◽

Optimization Approach ◽

High Quality ◽

Trade Off ◽

Multi Objective ◽

Current State ◽

Image Artefacts

Current state-of-the-art medical deformable image registration (DIR) methods optimize a weighted sum of key objectives of interest. Having a pre-determined weight combination that leads to high-quality results for any instance of a specific DIR problem (i.e., a class solution) would facilitate clinical application of DIR. However, such a combination can vary widely for each instance and is currently often manually determined. A multi-objective optimization approach for DIR removes the need for manual tuning, providing a set of high-quality trade-off solutions. Here, we investigate machine learning for a multi-objective class solution, i.e., not a single weight combination, but a set thereof, that, when used on any instance of a specific DIR problem, approximates such a set of trade-off solutions. To this end, we employed a multi-objective evolutionary algorithm to learn sets of weight combinations for three breast DIR problems of increasing difficulty: 10 prone-prone cases, 4 prone-supine cases with limited deformations and 6 prone-supine cases with larger deformations and image artefacts. Clinically-acceptable results were obtained for the first two problems. Therefore, for DIR problems with limited deformations, a multi-objective class solution can be machine learned and used to compute straightforwardly multiple high-quality DIR outcomes, potentially leading to more efficient use of DIR in clinical practice.

Download Full-text

Machine Learning, Ensemble Methods in

Computational Complexity ◽

10.1007/978-1-4614-1800-9_114 ◽

2012 ◽

pp. 1781-1789 ◽

Cited By ~ 2

Author(s):

Sašo Džeroski ◽

Panče Panov ◽

Bernard Ženko

Keyword(s):

Machine Learning ◽

Ensemble Methods

Download Full-text

Stock Market Prediction Using Machine Learning Techniques: A Decade Survey on Methodologies, Recent Developments, and Future Directions

Electronics ◽

10.3390/electronics10212717 ◽

2021 ◽

Vol 10 (21) ◽

pp. 2717

Author(s):

Nusrat Rouf ◽

Majid Bashir Malik ◽

Tasleem Arif ◽

Sparsh Sharma ◽

Saurabh Singh ◽

...

Keyword(s):

Machine Learning ◽

Stock Market ◽

Digital Libraries ◽

Stock Price ◽

Ensemble Methods ◽

Machine Learning Techniques ◽

Learning Approaches ◽

Stock Market Prediction ◽

Research Areas ◽

Recent Developments

With the advent of technological marvels like global digitization, the prediction of the stock market has entered a technologically advanced era, revamping the old model of trading. With the ceaseless increase in market capitalization, stock trading has become a center of investment for many financial investors. Many analysts and researchers have developed tools and techniques that predict stock price movements and help investors in proper decision-making. Advanced trading models enable researchers to predict the market using non-traditional textual data from social platforms. The application of advanced machine learning approaches such as text data analytics and ensemble methods have greatly increased the prediction accuracies. Meanwhile, the analysis and prediction of stock markets continue to be one of the most challenging research areas due to dynamic, erratic, and chaotic data. This study explains the systematics of machine learning-based approaches for stock market prediction based on the deployment of a generic framework. Findings from the last decade (2011–2021) were critically analyzed, having been retrieved from online digital libraries and databases like ACM digital library and Scopus. Furthermore, an extensive comparative analysis was carried out to identify the direction of significance. The study would be helpful for emerging researchers to understand the basics and advancements of this emerging area, and thus carry-on further research in promising directions.

Download Full-text

Human In Command Machine Learning

10.24834/isbn.9789178771875 ◽

2021 ◽

Author(s):

◽

Lars Holmberg

Keyword(s):

Machine Learning ◽

Human Life ◽

Design Guidelines ◽

Significant Other ◽

Target Domain ◽

Extraterrestrial Life ◽

Human Capabilities ◽

Domain Experts ◽

Model Training ◽

Selection Of

Machine Learning (ML) and Artificial Intelligence (AI) impact many aspects of human life, from recommending a significant other to assist the search for extraterrestrial life. The area develops rapidly and exiting unexplored design spaces are constantly laid bare. The focus in this work is one of these areas; ML systems where decisions concerning ML model training, usage and selection of target domain lay in the hands of domain experts. This work is then on ML systems that function as a tool that augments and/or enhance human capabilities. The approach presented is denoted Human In Command ML (HIC-ML) systems. To enquire into this research domain design experiments of varying fidelity were used. Two of these experiments focus on augmenting human capabilities and targets the domains commuting and sorting batteries. One experiment focuses on enhancing human capabilities by identifying similar hand-painted plates. The experiments are used as illustrative examples to explore settings where domain experts potentially can: independently train an ML model and in an iterative fashion, interact with it and interpret and understand its decisions. HIC-ML should be seen as a governance principle that focuses on adding value and meaning to users. In this work, concrete application areas are presented and discussed. To open up for designing ML-based products for the area an abstract model for HIC-ML is constructed and design guidelines are proposed. In addition, terminology and abstractions useful when designing for explicability are presented by imposing structure and rigidity derived from scientific explanations. Together, this opens up for a contextual shift in ML and makes new application areas probable, areas that naturally couples the usage of AI technology to human virtues and potentially, as a consequence, can result in a democratisation of the usage and knowledge concerning this powerful technology.

Download Full-text