Multi-Fidelity Automatic Hyper-Parameter Tuning via Transfer Series Expansion

Automatic machine learning (AutoML) aims at automatically choosing the best configuration for machine learning tasks. However, a configuration evaluation can be very time consuming particularly on learning tasks with large datasets. This limitation usually restrains derivative-free optimization from releasing its full power for a fine configuration search using many evaluations. To alleviate this limitation, in this paper, we propose a derivative-free optimization framework for AutoML using multi-fidelity evaluations. It uses many lowfidelity evaluations on small data subsets and very few highfidelity evaluations on the full dataset. However, the lowfidelity evaluations can be badly biased, and need to be corrected with only a very low cost. We thus propose the Transfer Series Expansion (TSE) that learns the low-fidelity correction predictor efficiently by linearly combining a set of base predictors. The base predictors can be obtained cheaply from down-scaled and experienced tasks. Experimental results on real-world AutoML problems verify that the proposed framework can accelerate derivative-free configuration search significantly by making use of the multi-fidelity evaluations.

Download Full-text

Derivative Free Optimization of Complex Systems with the Use of Statistical Machine Learning Models

10.21236/ada622645 ◽

2015 ◽

Author(s):

Katya Scheinberg

Keyword(s):

Machine Learning ◽

Complex Systems ◽

Learning Models ◽

Statistical Machine Learning ◽

Derivative Free Optimization ◽

Derivative Free ◽

Machine Learning Models

Download Full-text

CatBoost for Big Data: an Interdisciplinary Review

10.21203/rs.3.rs-54646/v1 ◽

2020 ◽

Author(s):

John Hancock ◽

Taghi M Khoshgoftaar

Keyword(s):

Machine Learning ◽

Big Data ◽

Interdisciplinary Approach ◽

Parameter Tuning ◽

Heterogeneous Data ◽

Ensemble Techniques ◽

Learning Tasks ◽

The Family ◽

Classification And Regression ◽

Boosted Decision Trees

Abstract Gradient Boosted Decision Trees (GBDT's) are a powerful tool for classification and regression tasks in Big Data, Researchers should be familiar with the strengths and weaknesses of current implementations of GBDT's in order to use them effectively and make successful contributions. CatBoost is a member of the family of GBDT machine learning ensemble techniques. Since its debut in late 2018, researchers have ellCcessfully used CatBoost for machine learning studies involving Big Data. We take this opportunity to review recent research on CatBoost as it relates to Big Data, and learn best practices from studies that .55 CatBoost in a positive light, as well as studies where CatBoost does not outshine other techniques, since we can learn lessons from both types of scenarios. Furthermore, as a Decision Tree based algorithm, CatBoost is well-suited to machine learning tasks involving categorical, heterogeneous data. Recent work across multiple disciplines illustrates CatBoost's effectiveness and shortcomings in classification and regression tasks. Another important issue we expose in literature on CatBoost is its sensitivity to hyper-parameters and the importance of hyper-parameter tuning. One contribution we make is to take an interdisciplinary approach to cover studies related to CatBoost in a single work. This provides researchers an in-depth understanding to help clarify proper application of CatBoost in solving problems. To the best of our knowledge, this is the first survey that studies all works related to CatBoost in a single publication.

Download Full-text

CatBoost for Big Data: an Interdisciplinary Review

10.21203/rs.3.rs-54646/v2 ◽

2020 ◽

Author(s):

John Hancock ◽

Taghi M Khoshgoftaar

Keyword(s):

Machine Learning ◽

Big Data ◽

Interdisciplinary Approach ◽

Parameter Tuning ◽

Heterogeneous Data ◽

Ensemble Techniques ◽

Learning Tasks ◽

The Family ◽

Classification And Regression ◽

Boosted Decision Trees

Abstract Gradient Boosted Decision Trees (GBDT’s) are a powerful tool for classiﬁcation and regression tasks in Big Data. Researchers should be familiar with the strengths and weaknesses of current implementations of GBDT’s in order to use them eﬀectively and make successful contributions. CatBoost is a member of the family of GBDT machine learning ensemble techniques. Since its debut in late 2018, researchers have successfully used CatBoost for machine learning studies involving Big Data. We take this opportunity to review recent research on CatBoost as it relates to Big Data, and learn best practices from studies that cast CatBoost in a positive light, as well as studies where CatBoost does not outshine other techniques, since we can learn lessons from both types of scenarios. Furthermore, as a Decision Tree based algorithm, CatBoost is well-suited to machine learning tasks involving categorical, heterogeneous data. Recent work across multiple disciplines illustrates CatBoost’s eﬀectiveness and shortcomings in classiﬁcation and regression tasks. Another important issue we expose in literature on CatBoost is its sensitivity to hyper-parameters and the importance of hyper-parameter tuning. One contribution we make is to take an interdisciplinary approach to cover studies related to CatBoost in a single work. This provides researchers an in-depth understanding to help clarify proper application of CatBoost in solving problems. To the best of our knowledge, this is the ﬁrst survey that studies all works related to CatBoost in a single publication.

Download Full-text

Towards low-cost machine learning solutions for manufacturing SMEs

AI & Society ◽

10.1007/s00146-021-01332-8 ◽

2021 ◽

Author(s):

Jan Kaiser ◽

German Terrazas ◽

Duncan McFarlane ◽

Lavindra de Silva

Keyword(s):

Machine Learning ◽

Production Systems ◽

Low Cost ◽

Parameter Tuning ◽

Simple Solution ◽

Data Preparation ◽

Comprehensive Understanding ◽

Manufacturing Environment ◽

Extensive Data ◽

Learning Capabilities

AbstractMachine learning (ML) is increasingly used to enhance production systems and meet the requirements of a rapidly evolving manufacturing environment. Compared to larger companies, however, small- and medium-sized enterprises (SMEs) lack in terms of resources, available data and skills, which impedes the potential adoption of analytics solutions. This paper proposes a preliminary yet general approach to identify low-cost analytics solutions for manufacturing SMEs, with particular emphasis on ML. The initial studies seem to suggest that, contrarily to what is usually thought at first glance, SMEs seldom need digital solutions that use advanced ML algorithms which require extensive data preparation, laborious parameter tuning and a comprehensive understanding of the underlying problem. If an analytics solution does require learning capabilities, a ‘simple solution’, which we will characterise in this paper, should be sufficient.

Download Full-text

Investigating the Generalization Ability of Parameterized Quantum Circuits with Hierarchical Structures

Artificial Intelligence Evolution ◽

10.37256/aie.212021826 ◽

2021 ◽

pp. 11-22

Author(s):

Runheng Ran ◽

Haozhen Situ

Keyword(s):

Machine Learning ◽

Hierarchical Structures ◽

Optimization Methods ◽

Quantum Circuits ◽

Generalization Ability ◽

Covariance Matrix Adaptation ◽

Derivative Free Optimization ◽

Machine Learning Model ◽

Derivative Free ◽

Adaptation Evolution

Quantum computing provides prospects for improving machine learning, which are mainly achieved through two aspects, one is to accelerate the calculation, and the other is to improve the performance of the model. As an important feature of machine learning models, generalization ability characterizes models' ability to predict unknown data. Aiming at the question of whether the quantum machine learning model provides reliable generalization ability, quantum circuits with hierarchical structures are explored to classify classical data as well as quantum state data. We also compare three different derivative-free optimization methods, i.e., Covariance Matrix Adaptation Evolution Strategy (CMA-ES), Constrained Optimization by Linear Approximation (COBYLA) and Powell. Numerical results show that these quantum circuits have good performance in terms of trainability and generalization ability.

Download Full-text

Ensemble Kalman inversion: a derivative-free technique for machine learning tasks

Inverse Problems ◽

10.1088/1361-6420/ab1c3a ◽

2019 ◽

Vol 35 (9) ◽

pp. 095005 ◽

Cited By ~ 8

Author(s):

Nikola B Kovachki ◽

Andrew M Stuart

Keyword(s):

Machine Learning ◽

Learning Tasks ◽

Derivative Free

Download Full-text

CatBoost for big data: an interdisciplinary review

Journal Of Big Data ◽

10.1186/s40537-020-00369-8 ◽

2020 ◽

Vol 7 (1) ◽

Author(s):

John T. Hancock ◽

Taghi M. Khoshgoftaar

Keyword(s):

Machine Learning ◽

Big Data ◽

Interdisciplinary Approach ◽

Parameter Tuning ◽

Heterogeneous Data ◽

Ensemble Techniques ◽

Learning Tasks ◽

The Family ◽

Classification And Regression ◽

Boosted Decision Trees

Abstract Gradient Boosted Decision Trees (GBDT’s) are a powerful tool for classification and regression tasks in Big Data. Researchers should be familiar with the strengths and weaknesses of current implementations of GBDT’s in order to use them effectively and make successful contributions. CatBoost is a member of the family of GBDT machine learning ensemble techniques. Since its debut in late 2018, researchers have successfully used CatBoost for machine learning studies involving Big Data. We take this opportunity to review recent research on CatBoost as it relates to Big Data, and learn best practices from studies that cast CatBoost in a positive light, as well as studies where CatBoost does not outshine other techniques, since we can learn lessons from both types of scenarios. Furthermore, as a Decision Tree based algorithm, CatBoost is well-suited to machine learning tasks involving categorical, heterogeneous data. Recent work across multiple disciplines illustrates CatBoost’s effectiveness and shortcomings in classification and regression tasks. Another important issue we expose in literature on CatBoost is its sensitivity to hyper-parameters and the importance of hyper-parameter tuning. One contribution we make is to take an interdisciplinary approach to cover studies related to CatBoost in a single work. This provides researchers an in-depth understanding to help clarify proper application of CatBoost in solving problems. To the best of our knowledge, this is the first survey that studies all works related to CatBoost in a single publication.

Download Full-text

Machine Learning for Acute Toxicity Prediction Using High-Throughput Enzyme-Reaction Chip

10.26434/chemrxiv.7263596.v2 ◽

2019 ◽

Author(s):

Qiannan Duan ◽

Jianchao Lee ◽

Jinhong Gao ◽

Jiayuan Chen ◽

Yachao Lian ◽

...

Keyword(s):

Machine Learning ◽

Acute Toxicity ◽

Model Building ◽

Low Cost ◽

Enzyme Reaction ◽

Chemical Effect ◽

Technological Innovations ◽

Acute Toxicity Test ◽

Toxicity Prediction ◽

Traditional Understanding

<p>Machine learning (ML) has brought significant technological innovations in many fields, but it has not been widely embraced by most researchers of natural sciences to date. Traditional understanding and promotion of chemical analysis cannot meet the definition and requirement of big data for running of ML. Over the years, we focused on building a more versatile and low-cost approach to the acquisition of copious amounts of data containing in a chemical reaction. The generated data meet exclusively the thirst of ML when swimming in the vast space of chemical effect. As proof in this study, we carried out a case for acute toxicity test throughout the whole routine, from model building, chip preparation, data collection, and ML training. Such a strategy will probably play an important role in connecting ML with much research in natural science in the future.</p>

Download Full-text