AutoML to Date and Beyond: Challenges and Opportunities

Shubhra Kanti Karmaker (“Santu”); Md. Mahadi Hassan; Micah J. Smith; Lei Xu; Chengxiang Zhai; Kalyan Veeramachaneni

doi:10.1145/3470918

AutoML to Date and Beyond: Challenges and Opportunities

ACM Computing Surveys ◽

10.1145/3470918 ◽

2022 ◽

Vol 54 (8) ◽

pp. 1-36

Author(s):

Shubhra Kanti Karmaker (“Santu”) ◽

Md. Mahadi Hassan ◽

Micah J. Smith ◽

Lei Xu ◽

Chengxiang Zhai ◽

...

Keyword(s):

Machine Learning ◽

Training Dataset ◽

Learning Tools ◽

Specific Data ◽

Domain Experts ◽

Domain Specific ◽

Challenges And Opportunities ◽

Automated Machine Learning ◽

End To End ◽

Prediction Problems

As big data becomes ubiquitous across domains, and more and more stakeholders aspire to make the most of their data, demand for machine learning tools has spurred researchers to explore the possibilities of automated machine learning (AutoML). AutoML tools aim to make machine learning accessible for non-machine learning experts (domain experts), to improve the efficiency of machine learning, and to accelerate machine learning research. But although automation and efficiency are among AutoML’s main selling points, the process still requires human involvement at a number of vital steps, including understanding the attributes of domain-specific data, defining prediction problems, creating a suitable training dataset, and selecting a promising machine learning technique. These steps often require a prolonged back-and-forth that makes this process inefficient for domain experts and data scientists alike and keeps so-called AutoML systems from being truly automatic. In this review article, we introduce a new classification system for AutoML systems, using a seven-tiered schematic to distinguish these systems based on their level of autonomy. We begin by describing what an end-to-end machine learning pipeline actually looks like, and which subtasks of the machine learning pipeline have been automated so far. We highlight those subtasks that are still done manually—generally by a data scientist—and explain how this limits domain experts’ access to machine learning. Next, we introduce our novel level-based taxonomy for AutoML systems and define each level according to the scope of automation support provided. Finally, we lay out a roadmap for the future, pinpointing the research required to further automate the end-to-end machine learning pipeline and discussing important challenges that stand in the way of this ambitious goal.

Download Full-text

Benchmark and Survey of Automated Machine Learning Frameworks

Journal of Artificial Intelligence Research ◽

10.1613/jair.1.11854 ◽

2021 ◽

Vol 70 ◽

pp. 409-472

Author(s):

Marc-André Zöller ◽

Marco F. Huber

Keyword(s):

Machine Learning ◽

Daily Life ◽

Real Data ◽

Data Sets ◽

Domain Experts ◽

Vital Part ◽

Machine Learning Applications ◽

Automated Machine Learning ◽

Learning Frameworks

Machine learning (ML) has become a vital part in many aspects of our daily life. However, building well performing machine learning applications requires highly specialized data scientists and domain experts. Automated machine learning (AutoML) aims to reduce the demand for data scientists by enabling domain experts to build machine learning applications automatically without extensive knowledge of statistics and machine learning. This paper is a combination of a survey on current AutoML methods and a benchmark of popular AutoML frameworks on real data sets. Driven by the selected frameworks for evaluation, we summarize and review important AutoML techniques and methods concerning every step in building an ML pipeline. The selected AutoML frameworks are evaluated on 137 data sets from established AutoML benchmark suites.

Download Full-text

A review on machine learning for neutrino experiments

International Journal of Modern Physics A ◽

10.1142/s0217751x20430058 ◽

2020 ◽

Vol 35 (33) ◽

pp. 2043005

Author(s):

Fernanda Psihas ◽

Micah Groh ◽

Christopher Tunnell ◽

Karl Warburton

Keyword(s):

Machine Learning ◽

Neutrino Physics ◽

State Of The Art ◽

Machine Learning Algorithms ◽

Current Status ◽

Learning Tools ◽

The Standard Model ◽

Challenges And Opportunities ◽

Machine Learning Applications ◽

Neutrino Experiments

Neutrino experiments study the least understood of the Standard Model particles by observing their direct interactions with matter or searching for ultra-rare signals. The study of neutrinos typically requires overcoming large backgrounds, elusive signals, and small statistics. The introduction of state-of-the-art machine learning tools to solve analysis tasks has made major impacts to these challenges in neutrino experiments across the board. Machine learning algorithms have become an integral tool of neutrino physics, and their development is of great importance to the capabilities of next generation experiments. An understanding of the roadblocks, both human and computational, and the challenges that still exist in the application of these techniques is critical to their proper and beneficial utilization for physics applications. This review presents the current status of machine learning applications for neutrino physics in terms of the challenges and opportunities that are at the intersection between these two fields.

Download Full-text

Cluster Validation

Encyclopedia of Data Warehousing and Mining, Second Edition ◽

10.4018/978-1-60566-010-3.ch038 ◽

2011 ◽

pp. 231-236

Author(s):

Ricardo Vilalta ◽

Tomasz Stepinski

Keyword(s):

Machine Learning ◽

Pattern Recognition ◽

Data Analysis ◽

Visual Inspection ◽

Automated Analysis ◽

Learning Tools ◽

Planetary Surfaces ◽

Martian Surface ◽

New Classification ◽

Domain Experts

Spacecrafts orbiting a selected suite of planets and moons of our solar system are continuously sending long sequences of data back to Earth. The availability of such data provides an opportunity to invoke tools from machine learning and pattern recognition to extract patterns that can help to understand geological processes shaping planetary surfaces. Due to the marked interest of the scientific community on this particular planet, we base our current discussion on Mars, where there are presently three spacecrafts in orbit (e.g., NASA’s Mars Odyssey Orbiter, Mars Reconnaissance Orbiter, ESA’s Mars Express). Despite the abundance of available data describing Martian surface, only a small fraction of the data is being analyzed in detail because current techniques for data analysis of planetary surfaces rely on a simple visual inspection and descriptive characterization of surface landforms (Wilhelms, 1990). The demand for automated analysis of Mars surface has prompted the use of machine learning and pattern recognition tools to generate geomorphic maps, which are thematic maps of landforms (or topographical expressions). Examples of landforms are craters, valley networks, hills, basins, etc. Machine learning can play a vital role in automating the process of geomorphic mapping. A learning system can be employed to either fully automate the process of discovering meaningful landform classes using clustering techniques; or it can be used instead to predict the class of unlabeled landforms (after an expert has manually labeled a representative sample of the landforms) using classification techniques. The impact of these techniques on the analysis of Mars topography can be of immense value due to the sheer size of the Martian surface that remains unmapped. While it is now clear that machine learning can greatly help in automating the detailed analysis of Mars’ surface (Stepinski et al., 2007; Stepinski et al., 2006; Bue and Stepinski, 2006; Stepinski and Vilalta, 2005), an interesting problem, however, arises when an automated data analysis has produced a novel classification of a specific site’s landforms. The problem lies on the interpretation of this new classification as compared to traditionally derived classifications generated through visual inspection by domain experts. Is the new classification novel in all senses? Is the new classification only partially novel, with many landforms matching existing classifications? This article discusses how to assess the value of clusters generated by machine learning tools as applied to the analysis of Mars’ surface.

Download Full-text

PyODDS: An End-to-end Outlier Detection System with Automated Machine Learning

Companion Proceedings of the Web Conference 2020 ◽

10.1145/3366424.3383530 ◽

2020 ◽

Author(s):

Yuening Li ◽

Daochen Zha ◽

Praveen Venugopal ◽

Na Zou ◽

Xia Hu

Keyword(s):

Machine Learning ◽

Outlier Detection ◽

Detection System ◽

Automated Machine Learning ◽

End To End

Download Full-text

Machine Learning Tools For off-Target Early Safety Assessment of Small Molecules In Drug Discovery (Single Task Neural Networks Vs Automated Machine Learning)

10.21203/rs.3.rs-957525/v1 ◽

2021 ◽

Author(s):

Doha Naga ◽

Wolfgang Muster ◽

Eunice Musvasva ◽

Gerhard F. Ecker

Keyword(s):

Machine Learning ◽

Drug Discovery ◽

Pharmaceutical Companies ◽

Learning Tools ◽

Safety Issues ◽

Learning Framework ◽

Automated Machine Learning ◽

And Performance ◽

Preclinical Safety

Abstract Unpredicted drug safety issues constitute the majority of failures in the pharmaceutical industry according to several studies[1-3]. Some of these preclinical safety issues could be attributed to the non-selective binding of compounds to targets other than their intended therapeutic target, causing undesired adverse events. Consequently, pharmaceutical companies including Roche, routinely run in-vitro safety screens to detect off-target activities prior to preclinical and clinical studies.Hereby we present a machine learning framework aiming at the prediction of our in-house 50 off-target panel[4] activities for ~ 4000 compounds, directly from their structure. This framework is intended to guide chemists in the drug design process prior to synthesis and accelerate drug discovery. It incorporates different ML approaches such as deep learning and automated machine learning. Outcomes from different methods are compared in terms of efficiency and efficacy. The most important challenges and factors impacting model construction and performance in addition to suggestions on how to overcome such challenges are also discussed.

Download Full-text

Machine Learning: Tools for End‐to‐End Cognition

Towards Cognitive Autonomous Networks ◽

10.1002/9781119586449.ch6 ◽

2020 ◽

pp. 203-254

Author(s):

Stephen Mwanje ◽

Marton Kajo ◽

Benedek Schultz

Keyword(s):

Machine Learning ◽

Learning Tools ◽

End To End

Download Full-text

Pharm‐AutoML: an open‐source, end‐to‐end automated machine learning package for clinical outcome prediction

CPT Pharmacometrics & Systems Pharmacology ◽

10.1002/psp4.12621 ◽

2021 ◽

Author(s):

Gengbo Liu ◽

Dan Lu ◽

James Lu

Keyword(s):

Machine Learning ◽

Clinical Outcome ◽

Open Source ◽

Outcome Prediction ◽

Clinical Outcome Prediction ◽

Automated Machine Learning ◽

End To End

Download Full-text

Comparison of Automated Machine Learning Tools for SMS Spam Message Filtering

10.1007/978-981-16-8059-5_18 ◽

2021 ◽

pp. 307-316

Author(s):

Waddah Saeed

Keyword(s):

Machine Learning ◽

Learning Tools ◽

Automated Machine Learning

Download Full-text

Different mutation and crossover set of genetic programming in an automated machine learning

IAES International Journal of Artificial Intelligence (IJ-AI) ◽

10.11591/ijai.v9.i3.pp402-408 ◽

2020 ◽

Vol 9 (3) ◽

pp. 402

Author(s):

Suraya Masrom ◽

Masurah Mohamad ◽

Shahirah Mohamed Hatim ◽

Norhayati Baharun ◽

Nasiroh Omar ◽

...

Keyword(s):

Machine Learning ◽

Genetic Programming ◽

Evolutionary Algorithm ◽

Parameters Optimization ◽

Accuracy Score ◽

Learning Improvement ◽

Different Types ◽

Automated Machine Learning ◽

Prediction Problems ◽

Selection Of

<span lang="EN-US">Automated machine learning is a promising approach widely used to solve classification and prediction problems, which currently receives much attention for modification and improvement. One of the progressing works for automated machine learning improvement is the inclusion of evolutionary algorithm such as Genetic Programming. The function of Genetic Programming is to optimize the best combination of solutions from the possible pipelines of machine learning modelling, including selection of algorithms and parameters optimization of the selected algorithm. As a family of evolutionary based algorithm, the effectiveness of Genetic Programming in providing the best machine learning pipelines for a given problem or dataset is substantially depending on the algorithm parameterizations including the mutation and crossover rates. This paper presents the effect of different pairs of mutation and crossover rates on the automated machine learning performances that tested on different types of datasets. The finding can be used to support the theory that higher crossover rates used to improve the algorithm accuracy score while lower crossover rates may cause the algorithm to converge at earlier stage.</span>

Download Full-text

Evaluation of Different Machine Learning Tools in End-to-End Prediction of Vehicle Fuel Consumption in California

International Conference on Transportation and Development 2020 ◽

10.1061/9780784483138.007 ◽

2020 ◽

Author(s):

Mostafa Estaji ◽

John T. Harvey ◽

Erdem Coleri

Keyword(s):

Machine Learning ◽

Fuel Consumption ◽

Learning Tools ◽

End To End

Download Full-text