scholarly journals AutoML to Date and Beyond: Challenges and Opportunities

2022 ◽  
Vol 54 (8) ◽  
pp. 1-36
Author(s):  
Shubhra Kanti Karmaker (“Santu”) ◽  
Md. Mahadi Hassan ◽  
Micah J. Smith ◽  
Lei Xu ◽  
Chengxiang Zhai ◽  
...  

As big data becomes ubiquitous across domains, and more and more stakeholders aspire to make the most of their data, demand for machine learning tools has spurred researchers to explore the possibilities of automated machine learning (AutoML). AutoML tools aim to make machine learning accessible for non-machine learning experts (domain experts), to improve the efficiency of machine learning, and to accelerate machine learning research. But although automation and efficiency are among AutoML’s main selling points, the process still requires human involvement at a number of vital steps, including understanding the attributes of domain-specific data, defining prediction problems, creating a suitable training dataset, and selecting a promising machine learning technique. These steps often require a prolonged back-and-forth that makes this process inefficient for domain experts and data scientists alike and keeps so-called AutoML systems from being truly automatic. In this review article, we introduce a new classification system for AutoML systems, using a seven-tiered schematic to distinguish these systems based on their level of autonomy. We begin by describing what an end-to-end machine learning pipeline actually looks like, and which subtasks of the machine learning pipeline have been automated so far. We highlight those subtasks that are still done manually—generally by a data scientist—and explain how this limits domain experts’ access to machine learning. Next, we introduce our novel level-based taxonomy for AutoML systems and define each level according to the scope of automation support provided. Finally, we lay out a roadmap for the future, pinpointing the research required to further automate the end-to-end machine learning pipeline and discussing important challenges that stand in the way of this ambitious goal.

2021 ◽  
Vol 70 ◽  
pp. 409-472
Author(s):  
Marc-André Zöller ◽  
Marco F. Huber

Machine learning (ML) has become a vital part in many aspects of our daily life. However, building well performing machine learning applications requires highly specialized data scientists and domain experts. Automated machine learning (AutoML) aims to reduce the demand for data scientists by enabling domain experts to build machine learning applications automatically without extensive knowledge of statistics and machine learning. This paper is a combination of a survey on current AutoML methods and a benchmark of popular AutoML frameworks on real data sets. Driven by the selected frameworks for evaluation, we summarize and review important AutoML techniques and methods concerning every step in building an ML pipeline. The selected AutoML frameworks are evaluated on 137 data sets from established AutoML benchmark suites.


2020 ◽  
Vol 35 (33) ◽  
pp. 2043005
Author(s):  
Fernanda Psihas ◽  
Micah Groh ◽  
Christopher Tunnell ◽  
Karl Warburton

Neutrino experiments study the least understood of the Standard Model particles by observing their direct interactions with matter or searching for ultra-rare signals. The study of neutrinos typically requires overcoming large backgrounds, elusive signals, and small statistics. The introduction of state-of-the-art machine learning tools to solve analysis tasks has made major impacts to these challenges in neutrino experiments across the board. Machine learning algorithms have become an integral tool of neutrino physics, and their development is of great importance to the capabilities of next generation experiments. An understanding of the roadblocks, both human and computational, and the challenges that still exist in the application of these techniques is critical to their proper and beneficial utilization for physics applications. This review presents the current status of machine learning applications for neutrino physics in terms of the challenges and opportunities that are at the intersection between these two fields.


Author(s):  
Ricardo Vilalta ◽  
Tomasz Stepinski

Spacecrafts orbiting a selected suite of planets and moons of our solar system are continuously sending long sequences of data back to Earth. The availability of such data provides an opportunity to invoke tools from machine learning and pattern recognition to extract patterns that can help to understand geological processes shaping planetary surfaces. Due to the marked interest of the scientific community on this particular planet, we base our current discussion on Mars, where there are presently three spacecrafts in orbit (e.g., NASA’s Mars Odyssey Orbiter, Mars Reconnaissance Orbiter, ESA’s Mars Express). Despite the abundance of available data describing Martian surface, only a small fraction of the data is being analyzed in detail because current techniques for data analysis of planetary surfaces rely on a simple visual inspection and descriptive characterization of surface landforms (Wilhelms, 1990). The demand for automated analysis of Mars surface has prompted the use of machine learning and pattern recognition tools to generate geomorphic maps, which are thematic maps of landforms (or topographical expressions). Examples of landforms are craters, valley networks, hills, basins, etc. Machine learning can play a vital role in automating the process of geomorphic mapping. A learning system can be employed to either fully automate the process of discovering meaningful landform classes using clustering techniques; or it can be used instead to predict the class of unlabeled landforms (after an expert has manually labeled a representative sample of the landforms) using classification techniques. The impact of these techniques on the analysis of Mars topography can be of immense value due to the sheer size of the Martian surface that remains unmapped. While it is now clear that machine learning can greatly help in automating the detailed analysis of Mars’ surface (Stepinski et al., 2007; Stepinski et al., 2006; Bue and Stepinski, 2006; Stepinski and Vilalta, 2005), an interesting problem, however, arises when an automated data analysis has produced a novel classification of a specific site’s landforms. The problem lies on the interpretation of this new classification as compared to traditionally derived classifications generated through visual inspection by domain experts. Is the new classification novel in all senses? Is the new classification only partially novel, with many landforms matching existing classifications? This article discusses how to assess the value of clusters generated by machine learning tools as applied to the analysis of Mars’ surface.


2021 ◽  
Author(s):  
Doha Naga ◽  
Wolfgang Muster ◽  
Eunice Musvasva ◽  
Gerhard F. Ecker

Abstract Unpredicted drug safety issues constitute the majority of failures in the pharmaceutical industry according to several studies[1-3]. Some of these preclinical safety issues could be attributed to the non-selective binding of compounds to targets other than their intended therapeutic target, causing undesired adverse events. Consequently, pharmaceutical companies including Roche, routinely run in-vitro safety screens to detect off-target activities prior to preclinical and clinical studies.Hereby we present a machine learning framework aiming at the prediction of our in-house 50 off-target panel[4] activities for ~ 4000 compounds, directly from their structure. This framework is intended to guide chemists in the drug design process prior to synthesis and accelerate drug discovery. It incorporates different ML approaches such as deep learning and automated machine learning. Outcomes from different methods are compared in terms of efficiency and efficacy. The most important challenges and factors impacting model construction and performance in addition to suggestions on how to overcome such challenges are also discussed.


Author(s):  
Stephen Mwanje ◽  
Marton Kajo ◽  
Benedek Schultz

Author(s):  
Suraya Masrom ◽  
Masurah Mohamad ◽  
Shahirah Mohamed Hatim ◽  
Norhayati Baharun ◽  
Nasiroh Omar ◽  
...  

<span lang="EN-US">Automated machine learning is a promising approach widely used to solve classification and prediction problems, which currently receives much attention for modification and improvement. One of the progressing works for automated machine learning improvement is the inclusion of evolutionary algorithm such as Genetic Programming. The function of Genetic Programming is to optimize the best combination of solutions from the possible pipelines of machine learning modelling, including selection of algorithms and parameters optimization of the selected algorithm.  As a family of evolutionary based algorithm, the effectiveness of Genetic Programming in providing the best machine learning pipelines for a given problem or dataset is substantially depending on the algorithm parameterizations including the mutation and crossover rates. This paper presents the effect of different pairs of mutation and crossover rates on the automated machine learning performances that tested on different types of datasets. The finding can be used to support the theory that higher crossover rates used to improve the algorithm accuracy score while lower crossover rates may cause the algorithm to converge at earlier stage.</span>


Sign in / Sign up

Export Citation Format

Share Document