scholarly journals Multi-Fidelity Automatic Hyper-Parameter Tuning via Transfer Series Expansion

Author(s):  
Yi-Qi Hu ◽  
Yang Yu ◽  
Wei-Wei Tu ◽  
Qiang Yang ◽  
Yuqiang Chen ◽  
...  

Automatic machine learning (AutoML) aims at automatically choosing the best configuration for machine learning tasks. However, a configuration evaluation can be very time consuming particularly on learning tasks with large datasets. This limitation usually restrains derivative-free optimization from releasing its full power for a fine configuration search using many evaluations. To alleviate this limitation, in this paper, we propose a derivative-free optimization framework for AutoML using multi-fidelity evaluations. It uses many lowfidelity evaluations on small data subsets and very few highfidelity evaluations on the full dataset. However, the lowfidelity evaluations can be badly biased, and need to be corrected with only a very low cost. We thus propose the Transfer Series Expansion (TSE) that learns the low-fidelity correction predictor efficiently by linearly combining a set of base predictors. The base predictors can be obtained cheaply from down-scaled and experienced tasks. Experimental results on real-world AutoML problems verify that the proposed framework can accelerate derivative-free configuration search significantly by making use of the multi-fidelity evaluations.

2020 ◽  
Author(s):  
John Hancock ◽  
Taghi M Khoshgoftaar

Abstract Gradient Boosted Decision Trees (GBDT's) are a powerful tool for classification and regression tasks in Big Data, Researchers should be familiar with the strengths and weaknesses of current implementations of GBDT's in order to use them effectively and make successful contributions. CatBoost is a member of the family of GBDT machine learning ensemble techniques. Since its debut in late 2018, researchers have ellCcessfully used CatBoost for machine learning studies involving Big Data. We take this opportunity to review recent research on CatBoost as it relates to Big Data, and learn best practices from studies that .55 CatBoost in a positive light, as well as studies where CatBoost does not outshine other techniques, since we can learn lessons from both types of scenarios. Furthermore, as a Decision Tree based algorithm, CatBoost is well-suited to machine learning tasks involving categorical, heterogeneous data. Recent work across multiple disciplines illustrates CatBoost's effectiveness and shortcomings in classification and regression tasks. Another important issue we expose in literature on CatBoost is its sensitivity to hyper-parameters and the importance of hyper-parameter tuning. One contribution we make is to take an interdisciplinary approach to cover studies related to CatBoost in a single work. This provides researchers an in-depth understanding to help clarify proper application of CatBoost in solving problems. To the best of our knowledge, this is the first survey that studies all works related to CatBoost in a single publication.


2020 ◽  
Author(s):  
John Hancock ◽  
Taghi M Khoshgoftaar

Abstract Gradient Boosted Decision Trees (GBDT’s) are a powerful tool for classification and regression tasks in Big Data. Researchers should be familiar with the strengths and weaknesses of current implementations of GBDT’s in order to use them effectively and make successful contributions. CatBoost is a member of the family of GBDT machine learning ensemble techniques. Since its debut in late 2018, researchers have successfully used CatBoost for machine learning studies involving Big Data. We take this opportunity to review recent research on CatBoost as it relates to Big Data, and learn best practices from studies that cast CatBoost in a positive light, as well as studies where CatBoost does not outshine other techniques, since we can learn lessons from both types of scenarios. Furthermore, as a Decision Tree based algorithm, CatBoost is well-suited to machine learning tasks involving categorical, heterogeneous data. Recent work across multiple disciplines illustrates CatBoost’s effectiveness and shortcomings in classification and regression tasks. Another important issue we expose in literature on CatBoost is its sensitivity to hyper-parameters and the importance of hyper-parameter tuning. One contribution we make is to take an interdisciplinary approach to cover studies related to CatBoost in a single work. This provides researchers an in-depth understanding to help clarify proper application of CatBoost in solving problems. To the best of our knowledge, this is the first survey that studies all works related to CatBoost in a single publication.


AI & Society ◽  
2021 ◽  
Author(s):  
Jan Kaiser ◽  
German Terrazas ◽  
Duncan McFarlane ◽  
Lavindra de Silva

AbstractMachine learning (ML) is increasingly used to enhance production systems and meet the requirements of a rapidly evolving manufacturing environment. Compared to larger companies, however, small- and medium-sized enterprises (SMEs) lack in terms of resources, available data and skills, which impedes the potential adoption of analytics solutions. This paper proposes a preliminary yet general approach to identify low-cost analytics solutions for manufacturing SMEs, with particular emphasis on ML. The initial studies seem to suggest that, contrarily to what is usually thought at first glance, SMEs seldom need digital solutions that use advanced ML algorithms which require extensive data preparation, laborious parameter tuning and a comprehensive understanding of the underlying problem. If an analytics solution does require learning capabilities, a ‘simple solution’, which we will characterise in this paper, should be sufficient.


2021 ◽  
pp. 11-22
Author(s):  
Runheng Ran ◽  
Haozhen Situ

Quantum computing provides prospects for improving machine learning, which are mainly achieved through two aspects, one is to accelerate the calculation, and the other is to improve the performance of the model. As an important feature of machine learning models, generalization ability characterizes models' ability to predict unknown data. Aiming at the question of whether the quantum machine learning model provides reliable generalization ability, quantum circuits with hierarchical structures are explored to classify classical data as well as quantum state data. We also compare three different derivative-free optimization methods, i.e., Covariance Matrix Adaptation Evolution Strategy (CMA-ES), Constrained Optimization by Linear Approximation (COBYLA) and Powell. Numerical results show that these quantum circuits have good performance in terms of trainability and generalization ability.


2019 ◽  
Vol 35 (9) ◽  
pp. 095005 ◽  
Author(s):  
Nikola B Kovachki ◽  
Andrew M Stuart

2020 ◽  
Vol 7 (1) ◽  
Author(s):  
John T. Hancock ◽  
Taghi M. Khoshgoftaar

Abstract Gradient Boosted Decision Trees (GBDT’s) are a powerful tool for classification and regression tasks in Big Data. Researchers should be familiar with the strengths and weaknesses of current implementations of GBDT’s in order to use them effectively and make successful contributions. CatBoost is a member of the family of GBDT machine learning ensemble techniques. Since its debut in late 2018, researchers have successfully used CatBoost for machine learning studies involving Big Data. We take this opportunity to review recent research on CatBoost as it relates to Big Data, and learn best practices from studies that cast CatBoost in a positive light, as well as studies where CatBoost does not outshine other techniques, since we can learn lessons from both types of scenarios. Furthermore, as a Decision Tree based algorithm, CatBoost is well-suited to machine learning tasks involving categorical, heterogeneous data. Recent work across multiple disciplines illustrates CatBoost’s effectiveness and shortcomings in classification and regression tasks. Another important issue we expose in literature on CatBoost is its sensitivity to hyper-parameters and the importance of hyper-parameter tuning. One contribution we make is to take an interdisciplinary approach to cover studies related to CatBoost in a single work. This provides researchers an in-depth understanding to help clarify proper application of CatBoost in solving problems. To the best of our knowledge, this is the first survey that studies all works related to CatBoost in a single publication.


2019 ◽  
Author(s):  
Qiannan Duan ◽  
Jianchao Lee ◽  
Jinhong Gao ◽  
Jiayuan Chen ◽  
Yachao Lian ◽  
...  

<p>Machine learning (ML) has brought significant technological innovations in many fields, but it has not been widely embraced by most researchers of natural sciences to date. Traditional understanding and promotion of chemical analysis cannot meet the definition and requirement of big data for running of ML. Over the years, we focused on building a more versatile and low-cost approach to the acquisition of copious amounts of data containing in a chemical reaction. The generated data meet exclusively the thirst of ML when swimming in the vast space of chemical effect. As proof in this study, we carried out a case for acute toxicity test throughout the whole routine, from model building, chip preparation, data collection, and ML training. Such a strategy will probably play an important role in connecting ML with much research in natural science in the future.</p>


2020 ◽  
Vol 178 ◽  
pp. 65-74
Author(s):  
Ksenia Balabaeva ◽  
Liya Akmadieva ◽  
Sergey Kovalchuk

Nanoscale ◽  
2021 ◽  
Vol 13 (6) ◽  
pp. 3853-3859
Author(s):  
Ryosuke Mizuguchi ◽  
Yasuhiko Igarashi ◽  
Hiroaki Imai ◽  
Yuya Oaki

Lateral sizes of the exfoliated transition-metal–oxide nanosheets were predicted and controlled by the assistance of machine learning. 


Sign in / Sign up

Export Citation Format

Share Document